The site is a step-by-step development of a small site to a large site, and the challenge is mainly from huge users, bad security environment, high concurrent access and massive data, any simple business processing, once the need to deal with the number of P-meter data and the face of hundreds of millions of users, the problem will become tricky.
Now let's talk about the evolution:
Initial phase
Large Web sites are made up of small web sites, and so is the site architecture.
Small Web sites do not have too many people to access at the beginning, only need one server is more than sufficient, like this:
applications, databases, files and all other resources are on a single server, usually using Linuxphpmysqlapache can complete the entire project deployment, and then buy a domain name, rent a cheap server can start our website tour.
Separation of application services from data Services
With the development of the business, a gradual server has been unable to meet the requirements, then we can separate the application and data
After separation we use three servers: application servers, file servers, and database servers, as follows:
The requirements for these three servers are different:
The application server handles a lot of business logic, so it needs to be better faster and more powerful CPUs
The database server requires fast disk retrieval and data caching, so faster hard drives and larger memory are required
The file server needs to store the file resources uploaded by the user, so it requires a larger hard disk storage space
Application and data separation, the responsibilities become more single-minded, the site's performance has been further improved, but as users continue to increase, we need to further optimize the structure of the site.
Improving performance with caching
Web site access follows the 28 law: 80% of business visits are focused on 20% of the data
Therefore, we need to cache this small amount of data to reduce the database access pressure to improve the entire Web site data access speed, improve the database read and write performance
Web site caching can be divided into two types: cached locally on the application server and a remote cache on a dedicated distributed cache server
The local cache accesses faster, but is limited by the application server memory limit, and there is a memory contention;
Remote distributed caching can be clustered, and a server with large memory deployed as a dedicated caching server can theoretically be a cache service that is not limited by memory capacity.
As shown below:
With caching, data access pressure is effectively alleviated, but a single application server can handle a limited number of requests, at the peak of the visit, the application server will become the bottleneck of Web site performance.
Improving concurrent processing capabilities of Web sites using Application server clusters
The use of clustering is a Web site to solve high concurrency, massive data problems commonly used means, when you ascend to a certain extent, it should begin to ascend horizontally.
When a server's processing capacity is not enough, instead of replacing a more powerful server, it is better to add a server to share the original server pressure. For a large web site, no matter how powerful the server, can not meet the continuous growth of business needs, more efficient way is to increase the server to share the pressure
For a Web site architecture, if you add a new server to improve load pressure, you can use the same approach to address the flow of business requirements to achieve scalability of the system.
The load balancing dispatch server can distribute user requests to any server in the application server cluster, and if more users can add more application servers, the load pressure of the application server will no longer be the performance problem of the website.
Database read-Write separation
After using the cache, most of the operations can be done without database access, but there are still some read operations (cache access misses, cache expiration) and all the write operations need to access the database, when the number of users in the site to reach a certain time, the database load problem comes
At present, most of the database support master-slave hot backup, through the configuration of the master-slave relationship between the two servers, can be a database server data updates synchronized to another, the Web site to use this function, to achieve database read and write separation, so as to further improve the database load pressure
Application server in the write operation, access to the main database, the main database through the master-slave replication mechanism to update the data to the database, so that when the application server to read operations, you can access data from the database.
Use reverse proxy and CDN to speed Web site response
The basic principles of CDN and reverse proxy are caching.
CDN is deployed in the network supplier's computer room, when the user requests, will obtain the data from the nearest network supplier room;
The reverse proxy is deployed in the center room, when the user requests to reach the center room, will first access the reverse proxy server, if the reverse proxy Server cache This user requested resources, directly returned to the user.
The use of CDN and reverse proxy is to return to the user data as soon as possible, on the one hand, speed up user access, on the other hand, also reduce the pressure on the backend server.
Using Distributed file systems and distributed database systems
With the continuous development of the website business, this time can be like Distributed Application Server, the database system and file system for distributed management
Distributed database is the last means of Web site database splitting, we can generally take business sub-library, according to different business database deployed on different database server
Using NoSQL and search engines
Both of these methods rely on the Internet's technical means, the application server through a unified data access module to access various data, thereby reducing the application has multiple data sources of trouble.
Business Split
For large web sites, we can divide and conquer, the entire site's business into different modules, such as large-scale transaction shopping integrity can be divided into home, shops, orders, buyers, respectively, to the different business team to be responsible for.
At the same time we will split a Web site into multiple applications based on module division, each application for individual deployment and maintenance, application through hyperlinks to establish relationships (point to different application addresses), and finally through the same data storage system to form an interconnected complete system.
Distributed services
With the business split, the whole system is growing, the application of the overall complexity of the exponential increase, deployment maintenance more and more difficult, and all the application server to connect with the database service, in the case of tens of thousands of server size, the number of these connections is the size of the server, resulting in insufficient resources
At this time, the same business extraction, independent deployment, the reusable business and connection database, etc., as a public service, and the application system only need to access the public service services through distributed services to complete business operations
Here, most of the technical problems can be solved, and some real-time synchronization and other specific business problems can be solved through existing technology.