In the previous article, we talked about the evolution of the entire architecture, and this time we'll talk about where to think about design.
In order to enable the site to deal with high concurrent access, massive data processing, high reliability, such as a series of problems, we can choose horizontal or vertical two direction to start
Basic ideas
First, the entire architecture can be layered, generally can be divided into, 应用层
服务层
数据层
in practice, the large hierarchical structure can continue to be layered, such as 应用层
can continue to be divided into 视图层
and 业务逻辑层
, service layer can continue to be subdivided into 数据接口层
逻辑处理层
Through layering, we divide a huge system into different parts, facilitate the Division of labor development and maintenance, each layer has a certain degree of independence, in the development of the site can be based on different needs of the corresponding adjustment
After the logical layering, the physical deployment can also be based on the requirements of different policies can be deployed on the same physical machine, but with the development of the business, it is necessary to separate the different modules to deploy
Layered architecture is not just for planning the logical structure of the software to facilitate development and maintenance, and as the site progresses, layered architectures are especially important for high-concurrency distributed architectures of the site
After layering, the next step is to split the business vertically.
According to the different business modules a project is divided into different modules to the individual team to develop the deployment, after the completion of separate deployment on different servers, linked through the interconnection
Redundancy of different nodes according to different situations to ensure high availability of the site
Next, the cache, CDN, reverse proxy and so on optimization, here to elaborate
Okay, now let's get to the chase.
Schema elements
First, what do we need to consider for a high-traffic, big-data site?
Performance
The first is performance, performance is an important indicator of a website, unless there is no choice, on this one site, otherwise users will never endure a super slow site
Because the performance problem is everywhere, the way to solve the performance problem is also various, from the user request a URL to start, each link can be optimized, according to the above layer, can be roughly from three aspects of optimization, Application layer optimization, service layer optimization, Data layer optimization
The knowledge involved is the optimization of the Web front-end, application server-side optimization and data storage, indexing, caching, and so on, which will be separately in the later content to elaborate
But performance is a necessary condition of a website, in addition, because the site can not predict the pressure or attack, we also want to ensure that the site in a variety of situations (high concurrency, high load, uneven continuous pressure, etc.) to maintain stable performance
Availability of
For large sites, the situation is terrible, because you may have tens of millions of users, just a few minutes of downtime can lead to the site reputation, if it is the e-commerce site, more likely to lead to the loss of property, or even a lawsuit, then the loss is not only money and users
So we want to be able to provide 24 hours a day available, but the actual server is not guaranteed to run smoothly 24 hours a day, there may be hardware problems, there may be software problems, in short, there will always be problems
So the goal of our highly available design is to ensure that services or applications run properly when some servers are down
The main means of high availability of the site is 冗余
that application deployment provides access at the same time on multiple servers, and data storage is hot-backed between multiple data servers, so that any one server outage does not affect the overall service or application, nor does it result in data loss
For the application server, multiple application servers through a load balancing device to form a cluster while providing services, when a server down, service switching to other servers continue to execute, so as to ensure the high availability of the site, if the application server does not allow the storage of user session information, otherwise it will be lost So that it cannot continue even if the user requests to be forwarded to another server
For data storage servers, provide real-time backups between servers so that when one server goes down, data access is switched to other servers, and data recovery and backup
Measure whether a system architecture is designed to meet a high-availability goal, that is, if one or more servers are down, and if there are various unforeseen problems, the overall system is still available
Elasticity of
With high concurrent access and massive data storage for a large number of users, it is not possible to use just one server to meet all requirements and store all the data
集群
the way that multiple servers form a whole together to provide services, so-called means 伸缩性
to continuously add servers to the cluster to cope with rising user pressure and growing data storage requirements
For an application server cluster, all servers are equivalent as long as the data is not stored on the server, and new servers can be added to the cluster by using the appropriate load balancer device
For a cache server, joining a new server may invalidate the cache route, causing most of the cached data to be inaccessible, and the cache routing algorithm needs to be improved to ensure that the cached data is accessible
Although the relational database supports the mechanism of data replication and master-slave hot backup, it is difficult to realize the scalability of large scale clusters.
Scalability
The extensibility of the website is directly related to the development of the website function module, the rapid development of the website and the increasing function.
The main purpose of the extensibility of the site architecture is to enable it to respond quickly to changes in demand
is to be able to add new business, as far as possible to achieve no impact on existing products, do not need to change or change very little existing business to be able to launch new products; The coupling between different product businesses is small, and changes in one product or business do not affect others
Large networks will certainly attract third-party developers, invoke Web services, develop peripheral products, expand the website business, which requires the website to provide an open platform interface
Security
The last is security, the Internet is an open platform, anyone can access the site anywhere, security architecture is to protect the site from malicious access and attacks, protect data from theft
Performance, availability, scalability, extensibility, security several core elements of the site architecture, the purpose of our site architecture is mainly to solve these problems, the next will be introduced separately
High Performance architecture
When it comes to high performance, there are different definitions of performance in the eyes of different roles.
- User perspective: users feel the performance, that is, from the time of submission to see the page, different computer performance differences, different browsers to parse the speed of HTML, different network providers to provide the speed of Internet services, these differences will lead to the actual time than the server processing request time
In practice, we can use a number of front-end architecture optimization methods, through the optimization of HTML style, the use of browser asynchronous and concurrency features, adjust the caching strategy, using CDN services, reverse proxy, etc., so that users can see the content as soon as possible, even if not optimized for application services, can also improve the user experience well
- Developer Perspective: developers are more concerned about the performance of the application server, including response latency, system throughput, concurrency processing, system stability, etc.
The main optimization means can use cache to speed up data read, use cluster to improve throughput, use asynchronous to speed up request response, Optimize code improvement Program, etc.
- OPS perspective: for OPS people, you will be more concerned about the performance and utilization of some infrastructure, no more
Performance Test Metrics
The main performance test indicators are 响应时间
并发数
吞吐量
性能计数器
as
- Response Time
Refers to the time from the start of this request to the receipt of data, usually this time is very small or even less than the error value of the test, so we can use a duplicate request to obtain a specific response time, such as 100,000 requests, record the total time, and then calculate the time of a single request
- Concurrency number
The number of requests that can be processed concurrently, for a Web site, the number of concurrent users
There are several words that may be confusing, here's an explanation
Website system users > website number of users > website concurrent users
- Throughput
Is the number of requests that can be processed per unit of time, and the overall processing power of the system
There are a lot of metrics that can be 请求数/秒
页面数/秒
访问人数/天
处理业务数/小时
The commonly used quantitative indicators have TPS(每秒事务数)
HPS(每秒 HTTP 请求数)
QPS(每秒查询数)
such
- Performance Counters
Describe some of the performance metrics of the server or operating system, including system load, number of threads, memory usage, disk and network I/O, etc., when these values exceed the warning value (security threshold), you will want to develop operations personnel alarm, timely handling of exceptions
Performance test Methods
Performance testing is a general term, specific can be divided into,,, 性能测试
负载测试
压力测试
稳定性测试
Performance Testing
Based on the initial design targets, the system is constantly pressured to see if the expected performance can be achieved within the expected range.
Load Test
The system constantly increases concurrent requests to increase system pressure, until one or more of the system's indicators reach a security threshold, then continue to exert pressure on the system, the system's processing capacity will be reduced
pressure test
Calculate the maximum pressure tolerance of the system by continuing to press until the system crashes or is no longer able to process any requests over the security load
Stability Testing
Under a certain pressure (uneven pressure), the system can operate stably for a long time
As shown, a-B interval is the daily operating area of the site, most of the time is in this interval, and the C-point equivalent to the maximum load point of the system, B-C segment is for some reason the traffic exceeds the daily access pressure, after the C point, continue to increase the pressure, the system performance began to decline, But the resource consumption will be more, until point D, the system crash point, more than this point continues to pressure, the system will not be able to process any requests
Performance testing reflects the processing power of the system, which corresponds to the user's wait time (response time), as shown in the following:
Each point corresponds to the above performance test diagram until the system crashes and the user loses response
Performance optimization Strategy
First of all, to locate the cause of the problem, to troubleshoot different links of the log, to analyze which link response time and expectations do not match, and then analyze the reasons for the impact of performance, is the code or the architecture design is unreasonable, or insufficient system resources
Then is the performance optimization, according to the site's hierarchical architecture, can be broadly divided into web front-end performance optimization, Application server performance optimization, Storage server performance optimization three categories
Specific optimization method We'll write it next time, it's a little late today, sleep first.
Large Web site technology Architecture (2): Architectural features and high-performance architecture