Sites are small sites step by step into a large web site, and this challenge is mainly from the huge user, the security environment is bad, high concurrent access and massive data, any simple business processing, once need to deal with the number of P-meter data and face hundreds of millions of users, the problem will become tricky
Let's talk about the evolution of this process:
Initial stage
Large Web sites are made up of small websites, and the site architecture is the same
Small sites are not accessed by many people at first, and are more than enough to have a single server, just like this:
applications, databases, files, and all of the resources are on one server, usually Linux
PHP
MySQL
Apache
can be used to complete the project deployment, and then buy a domain name, rent a cheap server can start our website tour
Separation of application services from data Services
With the development of the business, a gradual server has been unable to meet the demand, then we can应用与数据分离
After separation we use three servers: the application server, the file server, and the database server, as follows:
The requirements for these three servers are different:
应用服务器
To handle a lot of business logic, you need better faster and more powerful CPUs
数据库服务器
Requires fast disk retrieval and data caching, requiring faster hard drives and larger memory
文件服务器
Need to store user-uploaded file resources, so larger hard disk storage space is required
After the separation of applications and data, the responsibilities become more exclusive, the performance of the site is further improved, but as users continue to grow, we need to further optimize the site architecture
Using caching to improve performance
Access to the site follows the 28 law: 80% of Business access is focused on 20% of the data
Therefore, we need to cache this small amount of data to reduce the access pressure of the database to improve the data access speed of the whole website and improve the reading and writing performance of the database.
The caching of Web sites can be divided into two types: local caches cached on the application server and remote caches on dedicated distributed cache servers
本地缓存
can be accessed faster, but is limited by the application server memory, the amount of cache data is restricted, and memory contention occurs
远程分布式缓存
You can use clustering to deploy large memory servers as a dedicated cache server, and you can theoretically do caching services that are not limited by memory capacity
As shown below:
With the use of caching, data access pressure is effectively mitigated, but a single application server can handle a limited number of request connections, at the peak of the visit, the application server will become the site performance bottleneck
Improve Web site concurrency with Application server clusters
Using a cluster is a common means of solving high concurrency, massive data problems, and when you're vertically lifted to a certain level, it's time to start horizontally.
When the processing power of a server is insufficient, instead of replacing a more powerful server, it is better to add a server to share the original server pressure. For large web sites, no matter how powerful the server, can not meet the continuous growth of business needs, more efficient way is to increase the server to share the pressure
For the site architecture, if adding a new server can improve load pressure, then you can use the same way to deal with the flow of business requirements, so as to achieve the scalability of the system
Load Balancer Dispatch server, can distribute user request to any server in the application server cluster, if more users, can add more application server, make application server load pressure no longer become the performance problem of the website
Database read/write separation
After using the cache, most operations can be done without database access, but there are still some read operations (cache access misses, cache expiration) and all write operations need to access the database, when the site's user volume reaches a certain time, the database load problem comes
Currently most databases support master-slave hot backup, by configuring the master-slave relationship between the two servers, you can synchronize data updates from one database server to another, and the website uses this function to realize database read and write separation, thus further improving the database load pressure
When the application server is writing, it accesses the primary database, and the primary database updates the data synchronously to the slave database through the master-slave replication mechanism, so that when the application server reads, it can access the data from the database.
Accelerate site response with reverse proxy and CDN
CDN
and 反向代理
the basic principle of all is cache
CDN
Deployed in the computer room of the network provider, the user obtains the data from the closest network supplier's room when making the request.
反向代理
is deployed in the central room, when the user requests to reach the central room, will first access the reverse proxy server, if the reverse proxy server caches the resources requested by this user, it is returned directly to the user
Use CDN
and 反向代理
all are to return to user data as soon as possible, on the one hand speed up user access speed, on the other hand also reduce the pressure of back-end server
Using Distributed file systems and distributed database systems
With the continued development of the website business, this time can be like Distributed Application Server, the database system and file system distributed management
分布式数据库
Is the last means of the site database splitting, generally we can take the business sub-Library, according to different business databases deployed on different database servers
Using NoSQL and search engines
Both methods rely on the technical means of the Internet, the application server through a unified data access module to access a variety of data, thereby reducing the application has multiple data sources of trouble
Business Split
For large sites, we can divide and conquer, the entire business of the site into different modules, such as large-scale transaction shopping can be divided into the home page, shops, orders, buyers, respectively, to different business teams responsible for
At the same time we split a Web site into multiple applications based on module partitioning, each application is deployed and maintained separately, the application is linked through hyperlinks (pointing to different application addresses), and finally through the same data storage system to form an interconnected complete system
Distributed services
With the business splitting, the whole system is getting bigger, the overall complexity of application increases exponentially, the deployment maintenance becomes more and more difficult, and all the application servers are connected with the database service, in the case of tens of thousands of server scale, the number of these connections is the square of server scale, which leads to insufficient resources
At this time, the same business will be extracted, independent deployment, these reusable business and connected databases, etc., extracted as a public service, and application system only through the distribution of services to access public business services to complete business operations
Here, basically most of the technical problems can be solved, there are some real-time synchronization and other specific business problems can be solved through the existing technology
Large Web site technology Architecture (1)