Large Web site Architecture evolution Major web site's attention indicators
- Highly Available
- Performance
- Easy to expand
- Scalable
- Safety
Features of large websites
- High concurrency, large flow
- Highly Available
- Massive data
- Wide range of users and complex network conditions
- Poor safety environment
- Rapid change of demand, frequent release
- Progressive development
Evolution and development of large-scale website architecture
- Initial phase, usually using lamp to build, all the resources are stored on a single server
- Separation of application services and data services, with separate database servers
- Using caching to improve Web site performance, based on the 28 law: 80% of business access is concentrated on 20% of the data
- Here you need to consider what data is appropriate for caching
- The cache can be a local cache, or it can be a remote distributed cache
- Improve Web site concurrency with Application server clusters
- If you can improve load pressure by adding a single server, you can continuously increase the server in the same way to continually improve system performance to achieve system scalability
- Here you need to consider which load-balanced policies to use
* Database read/write separation
- The data in the cache, if updated too quickly, will continue to flush the cache, thereby reducing performance
- The master-Slave hot-standby function can be used in the main database to synchronize the data on one database server to another by configuring the relationship between the two databases.
- Accelerating network response with reverse proxy and CDM
- The fundamentals of CDN and reverse proxy are both cache
- CDN deployed in the network provider's room, users can request network services, from their closest network to provide opportunity room to obtain data
- The reverse proxy is deployed in the central room of the website, when the user's request arrives at the central room, the server that first accesses is the reverse proxy server, if the reverse proxy server caches the resource requested by the user, it is returned directly to the user
- Using Distributed file systems and distributed database systems
- Web site commonly used database splitting means is a business sub-library, will be different business databases deployed to different physical servers
- Using NoSQL and search engines
- Business splitting, dividing the entire site into different product lines using divide-and-conquer means
- Distributed services
The value of the evolution of large web site architectures
The value of a website is what it can provide to the user, what the site can do, not how it is done. Therefore, for small web sites, the most need to do is a user to provide good service to create value, get the user's recognition, so live, savage growth.
- The core value of large Web site architecture technology is the flexibility to respond to the needs of the site, which is an evolutionary process
- The main force driving the development of large Web site technology is the business development of the website, which is the business achievement technology, not the opposite. So we have to abandon the technical and technological routines.
Large Web site architecture mode
- Layering, which is to slice the system in a transverse direction
- The challenge of layering is to rationally plan the hierarchy boundaries and interfaces
- Hierarchies include both physical and logical hierarchies
- segmentation, which is to slice the system in the longitudinal direction
- Separating different functions and services, packaging scales with high cohesion and low-coupling module units
- Distributed
- The purpose of layering and partitioning is to facilitate distributed deployment of small modules
- Problems: 1) distributed means that service calls must pass through the network, and the impact of bandwidth needs to be considered; 2) the more servers, the greater the probability of downtime
- Common distributed scenarios: 1) distributed Applications and Services, 2) distributed static resources, 3) distributed data and storage, 4) distributed computing, 5) distributed configuration, distributed locks, Distributed file systems ...
- cluster, where multiple servers deploy the same application, thus forming a cluster to provide services to the public through load balancing devices
- Even a small number of distributed applications and services are deployed to at least two servers to form a small cluster, which can improve the availability of the system
- Cache, where data is placed closest to the calculation to speed up processing
- Cdn
- Reverse Proxy
- Local cache
- Distributed cache
- Asynchronous, messaging between businesses is not a synchronous call, but instead divides a business operation into multiple phases, each of which is collaborated asynchronously by means of shared data
- You typically need to use Message Queuing
- Benefits: 1) Improve system availability, 2) speed up website response, 3) Eliminate concurrent access spikes
- Redundancy
- The inevitable result of the cluster
- The inevitable result of security requirements
- Automated, DevOps thinking to minimize human intervention
- Automated Publishing
- Automated code Management
- Automated testing
- Automated safety Monitoring
- Automating deployment
- Automated monitoring
- Automated alarms
- Automated fail-over, recovery
- Automating the allocation of resources
- ......
- Safety
Large site Core architecture elements
- Performance
- A performance issue can cause serious loss of site users
- Metrics to measure performance: Response time, TPS, performance counters, etc.
- Availability of
- No website can be perfect for 24x7 operation
- The premise of a high-availability site is that server downtime is inevitable, and the goal of a child-high-availability design is that services or applications remain available when the server goes down
- The necessary means are clustering, that is, redundancy
- Scalability, that is, by continuously adding servers to the cluster to increase the user's concurrent access pressure and growing data storage needs
- Metrics: Whether clusters can be built, and whether new servers can be easily added to the cluster
- Scalability, direct attention to the functionality of the site, to ensure rapid response to changes in requirements
- Metrics: Whether the site is adding new business products to the existing business transparency has no impact
- Security
- Measurement criteria: Is it possible to respond effectively to existing and potential attacks and spy measures?
Instantaneous response-High performance architecture Site performance from different perspectives
- User perspective
- It's mostly end-to-end feeling.
- Improve the user experience primarily by means of pre-optimization
- Developer Perspective
- Focuses on the performance of the application itself and related subsystems, including response latency, system throughput, concurrency processing, system stability, and more
- Key optimizations: Use caching to speed up data reads, use clusters to increase throughput, use asynchronous messages to speed up request response, use code optimization to boost program performance
- OPS Staff Perspective
- Focuses on infrastructure performance and resource utilization
- Main optimization means: Optimize backbone network, use cost-effective custom server, use virtualization technology to optimize resource utilization
Performance Test Metrics
- Response time, which is the time it takes for an application to perform an action, including the time it takes to start from the request to the last response data received
- Concurrent number, which is the number of requests that the system can process simultaneously, also reflects the load characteristics of the system
- Throughput, which is the number of requests processed by the system per unit of time, reflects the system's finishing capacity
- Performance counters, which describe some data metrics for server or operating system performance
Performance test Methods
- Performance test, the system design initial planning performance indicators as the expected goal, the system continuously pressurized, verify that the system within the acceptable range of resources, can achieve performance expectations
- Load test, the system constantly increase the concurrent request, know that the system or a number of performance indicators to achieve a security threshold
- Pressure tests, which exceed the safety load, continue to pressurize the system until the system crashes or no more requests can be processed
- Stability testing, in specific hardware, software, network conditions, to the system load a certain pressure, is the system for a long time to observe the stability of the system
Web Front segment optimization
- Browser Access Optimization
- Reduce HTTP Requests
- Using browser caching
- Enable compression
- CSS is placed on top of the page, JavaScript is placed at the bottom of the page
- Reduce cookie Transmission
- CDN Acceleration
- Reverse Proxy
Application Server Performance Optimization
- Distributed cache
- The cache is essentially a memory hash table
- Caches need to cache data that reads and writes very high and rarely changes, and in general the cache makes sense when reading and writing is more than 2:1.
- When the application reads the data, it first reads to the cache, if the cache does not exist or is invalidated, then accesses the database and puts new data into the cache
- Caching also requires attention to cache hotspot data
- Cache warm-up, in a newly-launched cache system, load hotspot data at startup, so that it can be used directly after startup
- Cache penetration, the application continues to access a large number of non-existent data, because such data does not exist in the cache, so a large number of access to the database, thereby reducing performance
- For distributed caches, there are currently two categories: 1) communication between different cache servers, such as JBoss Cache;2, does not communicate between different cache servers, such as memcached
- Asynchronous operation
- Message Queuing is typically used, with the additional benefit of flattened spikes
- Using the cluster
Code optimization
- Multithreading
- Need to be aware of thread safety issues, Methods: 1) Designing objects as stateless objects, 2) using local objects, 3) using locks when accessing resources concurrently
- Resource Reuse
- Primarily Singleton and resource pool (object pool)
- Data structures, choosing the right algorithm
- Garbage collection
- Reasonable setting garbage collection policy
Storage performance Optimization
- Mechanical hard drives vs solid-state drives
- B + trees vs lsm trees
- RAID vs HDFS
Foolproof-high-availability architecture
The usability of the site describes the features that the site can effectively access, which differs from ease of use
Site Availability Metrics
- Web site Unavailable time = point in time of failure-fault discovery point in time
- Website Annual Availability indicator = (1-Site unavailable time/total time of year) * 100%
- Generally expressed in several 9来, 2 9 is basically available, the website year is not available for less than 88 hours, 3 9 is higher available, the site year is less than 9 hours, 4 9 is a high availability with automatic recovery capability, the site year is less than 53 minutes, 5 9 is very high availability, Site year unavailable for less than 5 minutes
The site's high-availability architecture is designed to ensure that services remain available when server hardware fails, that data is still saved and accessible
The main means of the site's high-availability architecture: Redundant backups of data and services, and failover, switching services to other available servers once the server is down.
High-availability applications
Stateless application: The application server does not save the context information of the business, but only according to the data submitted each time the corresponding business logic processing, multiple service instances between the full peer, the request submitted to any server, the processing structure is the same
- Failover of stateless services through load balancing
- Load balancing: Mainly used in the case of high volume of business and data, when a single server is not enough to bear all the load pressure, through the load balancing method, the traffic and data are distributed to a cluster of multiple servers, to improve the overall load processing capacity
- Session Management of Application server cluster
- Session Copy
- Session binding
- Use cookies to record session
- Session Server
High-availability Services
- Tiered management
- Timeout settings
- asynchronous invocation
- Service downgrade
- Power-Equal design
High-availability data
- Primary means: Data backup and Fail-over
- Cap principle: A storage system that provides data services cannot simultaneously meet the three conditions of data consistency (consistency), data availability (availibility), partition tolerance (parition tolerance)
- Data consistency classification: 1) strong consistent data; 2) consistent data user; 3) Final data consistency
- Data backup
- The advantages of cold are simple and chain home, low cost and technical difficulty, disadvantage is not guarantee the final data consistency
- Hot sparing is divided into two types: 1) asynchronous heat preparation; 2) Synchronous hot standby
- Fail-over transfer
- Failure Confirmation: 1) heartbeat detection; 2) application Access failure report
- Access transfer
- Data recovery
Software quality assurance for highly available websites
- Website Publishing, its process and server downtime effect box, its impact on system availability is similar
- Generally take the form of batch updates, do not shut down all servers in the cluster at once
- Automated testing
- General use of selenium to test
- Pre-release validation
- The pre-release server is a special-purpose server, and the only difference between it and the official server on the line is that it is not configured on the Load Balancer server and cannot be accessed by external users
- Code control
- Backbone development, Branch Publishing
- Branch development, Trunk release, which is currently used in the mainstream way
- Automated Publishing
- Train Model : The release process of each application as a train journey, the train fixed-point operation, during which there are several sites, each station is routinely inspected, not through the project off, through the project continue to travel by train, until the train arrives at the end.
- In practice, all items may have been dropped off the road, so the train had to go back to its original point and wait until the problem was resolved.
- One of the most important items on the train. If it fails, the whole train needs to return
- Fewer people intervene, the higher the degree of automation, the less likely it is to introduce a fault
- Grayscale Publishing
- Large sites will use the Grayscale publishing mode, the cluster server into a number of parts, a day only a portion of the server, observe the stability of the operation is not a failure, the next day to continue to publish a portion of the server, a few days before you put the entire cluster released, during the period if you find the problem, only need to roll back the released part
Website Operation Monitoring
- Monitoring data acquisition
- User Behavior Log Collection
- Server performance Monitoring
- Run a data report
- Monitoring management
- System Alarms
- Fail-over transfer
- Automatic Graceful downgrade
Endless-Scalability Architecture
Website scalability: In no need to change the site's hardware and software design, simply by changing the number of servers deployed to expand or reduce the site's service processing capacity
Scalable design of website architecture
- Physical separation of different functions for scaling
- Single function scaling with cluster size
Scalability design of Application server cluster
- HTTP REDIRECT Load Balancing
- DNS domain name resolution load Balancing
- Reverse Proxy load Balancing
- IP load Balancing
- Data Link Layer Load Balancing
- Load Balancing algorithm
- Polling
- Weighted polling
- Random
- Minimum link
- Original Address Hash
Scalable design of distributed cache cluster
- memcached access model of distributed cache cluster
- Using the program to access the Memcached server cluster through the memcached client, the memcached client consists mainly of a set of APIs, memcached server cluster routing algorithm, memcached server cluster list, and communication module.
- The routing algorithm is responsible for calculating which server (write cache) should write data to memcached, or which server should read the data (read cache), based on the cache data entered by the application.
- Memcached scalability challenges for distributed cache clusters
- The challenge is mainly for the routing algorithm, when the cluster expands, how to ensure that the routing algorithm can get the newly added server?
- Workaround: Scale up when the site visits are minimal, and then gradually preheat the cache by simulating the request so that the data in the cache server is re-distributed
- A consistent hash algorithm for distributed cache
Scalability design of data Storage server cluster
- The data storage server must ensure the reliable storage of the data, in any case must ensure the availability and correctness of the data
- Design of scalability of relational database cluster
- Using master-slave structure to realize read and write separation
- According to different business data, put into different database clusters, that is, database sub-Library
- For extremely large tables, shard processing
- The design of the scalability of NoSQL database
On demand-scalable architecture
Scalability: The ability of the system to continue to expand or improve in the case of minimal impact to existing systems
Achieve scalable means: low-coupling, high cohesion
Using distributed Message Queuing to reduce system coupling
Build a business platform that you can take with distributed services
- Large, complex systems need to be split into modules that can be deployed independently to reduce coupling
- WEB Service and Enterprise distributed services
- Web service is bloated and you can consider using rest
- Or use an open source solution, such as Dubbo
Extensible data structure impregnable-security architecture Typical attack mode
- XSS attacks (cross-site scripting attacks)
- Hackers to tamper with web pages, injecting malicious HTML script, when users browse the Web page, control the user browser malicious operation of an attack way
- Category: 1) reflective type, 2) persistent type
- Solution: 1) disinfection; 2) HttpOnly
- Injection attack
- Category: 1) SQL injection attack; 2) OS injection attack
- Solution: 1) disinfection; 2) parameter binding
- CSRF Attack (cross-site request forgery)
- An attacker who illegally operates as a legitimate user through a cross-site request
- Workaround: Identify the requestor identity: 1) Form token, 2) verification Code, 3) Referer check
- Other attack methods
- Error Code, which may show the exception stack, exposing dangerous information, workaround: Use a unified 500 page
- HTML comments, comments may expose dangerous information, workaround: Code review or automatic scanning
- File upload, may upload a virus file, workaround: Set the upload file whitelist, only allow to upload files of the specified type
- Path traversal, using relative paths in URLs, traversing directories and files not open by the system, Workaround: Deploy resource files on separate servers, use separate domain names
Information encryption technology and key management
- Single hash encryption, including MD5, SHA, etc.
- Symmetric encryption, including DES algorithm, RC algorithm, etc.
- Asymmetric encryption, including RSA algorithms
- Key security Management
- Put the keys and algorithms on a separate server, or even a dedicated hardware setup to provide encryption and decryption services externally
- The encryption and decryption algorithm is placed in the application system, the key is placed in a separate server, when stored, the key is divided into several pieces, respectively, stored in different media
Large web site technology architecture reading notes