Take advantage of the holiday, finally the "large Web site technology architecture Core Principles and Case Analysis" this book read. This book gave me a new understanding of the site architecture, the whole book is more simple and easy to understand, so that people who get started more easily understood. It introduces a lot of technical concepts. To reverse proxy and CDN and so on. I write two articles, is the summary after reading.
The first chapter of the book is about the evolution of a large web site architecture, which gives a step-by-step evolution of a small web site.
Features of large Web sites: high concurrency, large flow, high availability, massive data, extensive user distribution, complex network situation, rapid demand change, frequent publication, progressive development, and poor security environment.
I. Site architecture at the start stage
This is the initial small site, there may be only one server, but also may be their own computer. Developed using lamp (linux+apache+mysql+php). The use of open source free software. A small web site is built to succeed, like applications, databases, and files on a single server. The following figure:
Two. Application services and Data Services separation
With the development of the website, a server can not meet the requirements. More and more users make the website performance programming, the storage space is insufficient. The application and the data should be separated and stored on separate servers. The application server requires a more powerful CPU, the database server requires fast disk retrieval and data caching, and the file server requires a larger hard disk. The following figure:
Three. Use caching to improve performance
With the increase in the number of users, the database pressure is increasing, each operation on the database may affect the performance of the entire site. The characteristics of the site also conform to the 28 law, 80% of the business access is concentrated on 20% of the data, so we can process the data, will often use the data cached in memory, so as to reduce the database access pressure. There are two ways to use caching: caching on a local or dedicated server. Distributed caching is clustered, and the server that deploys large memory as a dedicated caching server is theoretically able to achieve unlimited memory capacity. The following figure:
Four. Using the Application server cluster
In the peak of the website, application server becomes the bottleneck of the whole website, using cluster is the common means of dealing with high concurrency and massive data. When the processing capacity of a server is insufficient, then should add a server to share the original server access and storage pressure. By continuously increasing the server to improve the performance of the system to achieve the scalability of the system. The user's request is distributed to a server on the server cluster through load balancing. The following figure:
Five. Database read and write separation
With caching, the vast majority of data read-access operations can be done without a database. However, a small number of read operations and all write operations require access to the database. After the website scale reaches a certain degree, the database becomes the bottleneck because of the high load pressure. At present, most of the database provides master-slave hot backup function, through the configuration of two database master-slave relationship, the data of a database system to synchronize to another server. This enables the database to be read and write separated as follows:
Six. Use reverse proxy and CDN to speed up Web site response
To provide a better user experience, web sites need to speed up site access. The main use of CDN and reverse proxy. Their basic principles are caching, the difference is that CDN is deployed in the network provider's computer room, so that users can visit the site, from their nearest network to provide opportunity room to obtain data, while the direction of the agent is deployed in the center of the Web room, when the user requests to the center room, the first access to the reverse proxy server, If the resource requested by the user is cached in the reverse proxy server, it is returned directly to the user. As shown in figure:
Seven. Using Distributed file systems and distributed databases
Any powerful single server will not meet the growing business needs of large web sites. After the database is separated by read and write, it still can't satisfy the requirement. The same is true for file systems when you want to use a distributed database. Distributed databases are the last resort for Web site database splitting, including business and single table splits. The following figure:
Eight. Using NoSQL and search engines
With the complexity of the business, the need for data storage and retrieval is becoming more and more complex, the site needs to adopt some non relational database such as NoSQL and non-database query technology such as search engine. As shown in figure:
Nine. Business split
Large Web sites to cope with increasingly complex business scenarios, divide and conquer the site's business division, divided into different product lines, to different teams responsible. As shown in figure:
10. Distributed Services
As the business splits up and down, the storage system becomes larger, the overall complexity level increases, and the deployment maintenance becomes more and more difficult. Each application system may perform the same operations, such as user management. Extract these common businesses and deploy them independently, with these reusable business connections to the database, providing common business services, as shown in the following figure:
Through the evolution of the above 10, most of the technical problems are solved. This 10-point improvement allows me to understand the evolution of the Web site architecture, enhancing the intuitive understanding of the site architecture, although I did not go inside the room. The development of website technology is to progress with the business development. There are a lot of conceptual content, such as tiering, caching, CDN, distributed, and so on. In the next article, let's talk more specifically about these techniques.