1. Initial phase of the site architecture
The site initially did not have too many people to visit, just need a server more than enough, as shown in Figure 1, then the application, database, files and other resources are on a server. Usually the linux+apache+mysql+php architecture.
2. Application Server and Data service separation
With the development of the website business, a server is gradually unable to meet the demand: more and more user access leads to worse performance, and more and more data leads to insufficient storage space. This is the need to separate applications and data. After the application and data are detached, the entire site uses three servers: the application server, the file server, and the database server, as shown in Figure 2.
After application and data separation, different characteristics of the server assume different service roles, the site's concurrent processing capacity and data storage space has been greatly improved, support the business of the website further development. But as users become more and more, the site once again faces a challenge: the database pressure is too large to lead to access delays, which affect the performance of the entire site, the user experience received impact, need further optimization.
3. Improve site performance with caching
There are two types of Web sites that use caching: local caches cached on the application server and remote caches on dedicated distributed cache servers. Local caches are faster to access, but are subject to application server memory limits, with limited cache data, and scenarios where the application is competing for memory. The remote distributed cache can be clustered in a way that deploys large memory servers as a dedicated cache server, which can theoretically be a cache service that is not limited by memory capacity, as shown in Figure 3.
After using the cache, the data access pressure is effectively mitigated, but the single application server can handle a limited number of request connections, during the peak site access, the application server becomes the bottleneck of the whole website.
4. Improve the concurrency of your website with Application server clusters
Using clusters is a common means of solving high concurrency and massive data problems. When the processing power of a server, storage space is insufficient, by adding a server to share the original server access and storage pressure. As far as the structure of the website is concerned, as long as the load pressure can be improved by adding a server, the server can continuously improve the system performance in the same way, thus realizing the scalability of the system.
Load Balancer Dispatch server can distribute access requests from user's browser to any server in the cluster of application server, if there are more users, if more application servers in the cluster, the load pressure of application server will not become the bottleneck of the whole website.
5. Database read/write separation
Web site after using the cache, so that most of the data read operation access can be done without a database, but still have some read operation (cache access is not hit, cache expires) and all write operations need to access the database, the site after the user reached a certain size, the database because of high load pressure and become the bottleneck of the site.
At present, most of the mainstream database provides master-slave hot-standby function, by configuring two database master-slave relationship, one database server can synchronize the database update to another server. The Web site uses this function of the database to achieve database read and write separation, thereby improving database load pressure, as shown in 5.
When the application server writes the data, it accesses the primary database, and the primary database synchronizes the data updates to the slave database through the master-slave replication mechanism, so that when the application service reads the data, it can obtain the data from the database. In order to facilitate the application to access the database after read and write, usually use the special data Access module on the application server side, the database read/write separation is transparent to the application.
6. Accelerating site response with reverse proxy and CDN
With the continuous development of the website business, the size of users is growing, because of China's complex network environment, different regions of the users to visit the site, the speed difference is also great. Studies have shown that site access delays and user churn are positively correlated, the slower the website access, the more users are more likely to lose patience and leave. The future provides a better user experience, retention of users, websites need to speed up website access. The main means are to use CDN and reverse proxy, as shown in Figure 6.
The basic principle of CDN and reverse proxy is the cache, the difference is that the CDN is deployed in the network provider's room, so that when the user requests the website service, can obtain the data from the nearest network to provide the opportunity room, and the reverse proxy is deployed in the central room of the website, when the user requests to reach the center room, The first server to be accessed is the reverse proxy server, which is returned directly to the user if the resource requested by the user is cached in the reverse proxy server.
The purpose of using CDN and reverse proxy is to return data to users as soon as possible, on the one hand, speed up user access and reduce the load pressure on the backend server.
7. Using Distributed file systems and distributed database systems
Any powerful single server can not meet the continued growth of the business needs of large sites. After the database has been read and written separated, split from a server into two servers, but with the development of the website business still can not meet the demand, this is the need to use a distributed database. The file system, too, requires the use of a distributed file system, as shown in Figure 7.
Distributed database is the last method of database splitting, only used when the scale of single table data is very large. In the last resort, the more commonly used database splitting means is the business sub-Library, and the different business databases are deployed on different material servers.
8. Using NoSQL and search engines
As the website business becomes more complex, the need for data storage and retrieval is becoming more complex, and Web sites need to adopt non-relational database technologies such as NOSQL and non-database query technologies such as search engines, as shown in Figure 8.
Both NoSQL and search engines are technical tools derived from the Internet and have better support for scalable distributed features. The application server accesses various data through a unified data access module, easing the hassle of application management of many data sources.
9. Business splitting
Large web site in order to deal with increasingly complex business scenarios, through the use of divide and conquer the entire site business into different product lines, such as large shopping transaction site will be home, shops, orders, buyers, sellers and other split into different product lines, divided into different business team responsible.
specifically to the technical, according to the product line division, a site is split into many different applications, each application independent deployment maintenance. Applications can be linked through hyperlinks, or distributed through Message Queuing, or, most of all, by accessing the same data storage system to form an associated complete system, as shown in Figure 9.
10. Distributed Services
As business splits become smaller, storage systems become larger, the overall complexity of application systems grows exponentially, and deployment maintenance becomes more difficult. Because all applications are connected to all database systems, the number of these connections in tens of thousands of server-sized sites is the size of the server, resulting in insufficient database connection resources and denial of service.
Since every application needs to perform many of the same business operations, such as user management, commodity management, and so on, these common services can be extracted and deployed independently. These reusable business-connected databases provide a common business service, and the application only needs to manage the user interface to complete specific business operations through a distributed service invocation of a shared business service, as shown in Figure 10.
The architecture of large Web sites has evolved to the point where most of the technical problems are solved, such as real-time data synchronization across data centers and issues related to specific website business can be addressed by combining improvements to existing technology architectures.