Large-scale website architecture evolution and Knowledge system

Source: Internet
Author: User
Tags data structures hash requires website performance

There have been some articles about the evolution of large-scale web sites, such as LiveJournal and ebay, which are well worth referring to, but feel that they are talking more about the results of each evolution than on why they need to be evolved, Coupled with the recent feeling that a lot of students are difficult to understand why a website needs so complex technology, so there is the idea of writing this article, in this article will explain a common website developed into a large web site in the process of a more typical architecture evolution and need to master the knowledge system, Hope to be engaged in the Internet industry students a little preliminary concept,:), the text of the wrong place also ask you to give a little more advice, so that this article really play a starting effect.
<!--[If!supportlinebreaknewline]-->
<!--[endif]-->

Architecture Evolution First step: Physically separate webserver and databases

At first, because of some ideas, so on the internet to build a website, this time may even host is rented, but because this article we only focus on the evolution of the architecture, so it is assumed that this time is already hosting a host, and there is a certain bandwidth, this time due to the site has a certain characteristics, Attracted some people to visit, gradually you find the system pressure is getting higher and slower, and this time is more obvious is the database and application interaction, application problems, database is also prone to problems, and database problems, the application is also prone to problem, Then entered the first stage of evolution: The application and the database from the physical separation into two machines, this time there is no new technical requirements, but you find that the effect of the system has been restored to the previous response speed, and support higher traffic, and will not be due to the database and application to form a mutual impact.

Look at the diagram of the system after the completion of this step:

<!--[If!vml]-->
<!--[endif]-->

This step involves these knowledge systems:

This step of architecture evolution has little requirement on the technical knowledge system.
<!--[If!supportlinebreaknewline]-->
<!--[endif]-->

Architecture Evolution Step Two: Increase page caching

Not long, with more and more people visiting, you find that the response speed and began to slow down, find the reason, found to access the database too many operations, resulting in fierce competition in data connection, so the response is slow, but the database connection can not open too much, or the database machine pressure will be very high, So consider using a caching mechanism to reduce the competition of database connection resources and the pressure of database reading, this time may choose to use squid and other similar mechanisms to the system in a relatively static page (for example, a two-day update of the page) cache (of course, can also be used to static pages of the scheme), So that the program can not be modified, will be able to reduce the pressure on the webserver and reduce the competition of database connection resources, OK, so began to use squid to do relatively static cache of the page.

Look at the diagram of the system after the completion of this step:

<!--[If!vml]-->
<!--[endif]-->

This step involves these knowledge systems:

Front-end page caching technology, such as squid, if you want to use good words also have to grasp the implementation of squid and cache failure algorithm.

Architecture Evolution Step Three: Increase page fragment caching

Added squid to do the cache, the overall system speed is indeed improved, the pressure of the webserver began to decline, but with the increase in traffic, the discovery system began to change a little slower, in the taste of squid and other dynamic cache brought benefits, Starting to think about whether the relatively static parts of the dynamic pages are also cached, so consider using a page fragment caching strategy like ESI, OK, and start using ESI to do the caching of the relatively static fragment portion of the dynamic page.

Look at the diagram of the system after the completion of this step:

<!--[If!vml]-->
<!--[endif]-->

This step involves these knowledge systems:

Page fragment caching technology, such as ESI, to use good words also need to master the implementation of ESI, and so on;

Architecture Evolution Step Fourth: Data caching

In the adoption of technology such as ESI once again improve the system's cache effect, the system pressure is really further reduced, but again, with the increase in traffic, the system will start to slow down, after looking, you may find that there are some duplication of information in the system, such as access to user information, This time began to consider whether this data can be cached, so that the data cache to local memory, after the change is complete, fully meet the expectations, the system's response speed has been restored, the database pressure has been reduced a lot.

Look at the diagram of the system after the completion of this step:

<!--[If!vml]-->
<!--[endif]-->

This step involves these knowledge systems:

Caching techniques, including map data structures, caching algorithms, the implementation mechanism of the chosen framework itself.

Architecture Evolution Step Fifth: Increase webserver

Not long, found that with the increase in system access, Webserver machine pressure in the peak will rise to a relatively high, this time began to consider adding a webserver, which is also to solve the availability of the problem, to avoid a single webserver Down machine words can not use, after doing these considerations, decided to add a webserver, add a webserver, will encounter some problems, typical is:
1, how to assign access to the two machines, this time usually consider the plan is Apache's own load balancing scheme, or LVS such a software load balancing scheme;
2, how to maintain the synchronization of state information, such as user session, this time will consider the scheme has written to the database, write storage, cookies or synchronization session information mechanism, etc.
3, how to maintain the synchronization of data cache information, such as previously cached user data, etc., this time usually consider the mechanism of cache synchronization or distributed cache;
4, how to make uploading files these similar functions continue to normal, this time usually consider the mechanism is the use of shared file system or storage, etc.;
After solving these problems, the webserver is finally added to two units, and the system is finally back to the previous speed.

Look at the diagram of the system after the completion of this step:

<!--[If!vml]-->
<!--[endif]-->

This step involves these knowledge systems:

Load balancing technology (including but not limited to hardware load balancing, software load balancing, load algorithm, Linux forwarding Protocol, implementation details of selected technology, etc.), Master and standby technology (including but not limited to ARP spoofing, Linux heart-beat, etc.), State information or cache synchronization technology (including but not limited to cookie technology, UDP protocol, status information broadcast, implementation details of the selected cache synchronization technology, etc.), shared file technology (including but not limited to NFS, etc.), storage technology (including but not limited to storage devices, etc.).

Architecture Evolution Sixth Step: sub-Library

Enjoy a period of time the system visits the high-speed growth of happiness, the discovery system began to slow down, this is what the situation, after looking, found that the database write, update some of these operations database connection resource competition is very fierce, causing the system to slow down, how to do it, At this point, the option has a database cluster and sub-library policies, cluster aspects like some database support is not very good, so the sub-Library will become a more common strategy, sub-Library also means to modify the original program, a change to achieve the sub-Library, good, the goal reached, the system recovery even faster than before.

Look at the diagram of the system after the completion of this step:

<!--[If!vml]-->
<!--[endif]-->

This step involves these knowledge systems:

This step is more need to do a reasonable division from the business to achieve the sub-Library, the specific technical details of no other requirements;

At the same time, with the increase of data volume and the sub-Library, the design, tuning and maintenance of the database need to do better, so the technology in these areas has put forward a very high demand.

Architecture Evolution Step Seventh: Table, Dal, and distributed cache
With the continuous operation of the system, the volume of data began to grow substantially, this time to find the library after the query will still be a little slow, so according to the idea of sub-Library began to do the work of the sub-table, of course, this will inevitably need to make some changes to the program, perhaps at this time will be found to apply their own to is still somewhat complex, so the initiation can be added to a common framework to achieve the data access of the sub-database, which corresponds to the DAL in the architecture of ebay, the evolution of the process will take a relatively long time, of course, it is possible that the general framework will wait until the table is done before the start, while the At this stage, you may find that the previous cache synchronization scheme problems, because the volume of data is too large, it is not likely to present the cache locally, and then synchronize the way, the need to adopt a distributed cache scheme, so, is a survey and torture, and finally the large number of data cache transferred to the distributed cache.

Look at the diagram of the system after the completion of this step:

<!--[If!vml]-->
<!--[endif]-->

This step involves these knowledge systems:

Sub-table More is also the division of business, the technology involved in the dynamic hash algorithm, consistent hash algorithm and so on;

The DAL involves more complex techniques, such as the management of database connections (timeouts, exceptions), the control of database operations (timeouts, exceptions), the encapsulation of sub-list rules, etc.

Architecture Evolution Step Eighth: add more webserver

After doing the work of the sub-Library, the pressure on the database has dropped to a relatively low, and began to watch the daily traffic surge of happy life, suddenly one day, found that the system's visit began to slow trend, this time first to view the database, pressure all normal, then view webserver, found that Apache blocked a lot of requests, and the application server for each request is also relatively fast, it seems that the number of requests is too high caused the need to wait, slow response, this is OK, generally speaking, this time will be some money, so add some webserver server, in this add Webserver the server process, there are several challenges that may arise:
1. Apache soft load or LVS soft load can not bear the huge amount of web traffic (request connection number, network flow, etc.) scheduling, this time if the funding allows, the plan is to buy hardware load, such as F5, Netsclar, Athelon and so on, If the funds are not allowed, the plan is to make the application logically classified, and then dispersed to different soft load cluster;
2, some of the original state information synchronization, file sharing and other programs may be bottlenecks, need to be improved, perhaps this time will be based on the situation to write to meet the needs of the Web site Distributed file system, etc.
After doing this, we begin to enter an era of seemingly perfect infinity, and when website traffic increases, the solution is to constantly add webserver.

Look at the diagram of the system after the completion of this step:

<!--[If!vml]-->
<!--[endif]-->

This step involves these knowledge systems:

At this point, as the number of machines continues to grow, the volume of data continues to grow, and the requirements for system availability are increasing, this time requires a deeper understanding of the technologies used and the need for more customized products based on the needs of the site.

Architecture Evolution Step nineth: Data read-write separation and inexpensive storage solutions

Suddenly one day, found this perfect era also to end, the database nightmare again appeared in the eyes, because of the addition of webserver too much, resulting in the database connection resources is not enough, and this time has been divided into a table, and began to analyze the database pressure state, May find the database read and write ratio is very high, this time usually think of the data read and write separation scheme, of course, the implementation of this scheme is not easy, in addition, may find some data stored in the database some waste, or too occupy the database resources, So the evolution of architecture that could be formed at this stage is to achieve a separation of data read and write, while writing some more inexpensive storage schemes, such as bigtable.

Look at the diagram of the system after the completion of this step:

<!--[If!vml]-->
<!--[endif]-->

This step involves these knowledge systems:

Data read and write separation requirements of the database replication, standby and other strategies have in-depth grasp and understanding, at the same time will require a self-implemented technology;

The inexpensive storage scheme requires in-depth mastery and understanding of the file storage of the OS, and requires in-depth mastery of the implementation of the language in the file.

Architecture Evolution Step Tenth: Into the era of large-scale distributed applications and inexpensive server group Dream ERA

After the long and painful process above, and finally ushered in the perfect era, and constantly increase webserver can support more and more high traffic, for large sites, the importance of popularity is undoubtedly, with the popularity of more and more high, a variety of functional needs also began to explode the growth of sex, This time suddenly found that the original deployment of the Web application on the webserver is very large, when a number of teams began to change it, it is quite inconvenient, reusability is pretty bad, basically every team has done more or less duplication of things, and deployment and maintenance is also quite troublesome, Because the huge application package in the N machine to copy, start all need to spend a lot of time, the problem is not very good to check, and another worse situation is likely to be a bug in an application caused by the whole station is not available, there are other like tuning bad operation (because the application deployed on the machine to do everything, There is no way to make targeted tuning) and other factors, based on such analysis, began to make a decision, the system according to the responsibility of the split, so a large distributed application was born, usually, this step takes a long time, because there will be a lot of challenges:
1, split into a distributed after the need to provide a high-performance, stable communication framework, and need to support a variety of different communication and remote Call mode;
2, it takes a long time to split a huge application, need to do business collation and system dependency control, etc.
3, how to operate (rely on management, health management, error tracking, tuning, monitoring and alarm, etc.) good this huge distributed application.
After this step, the architecture of almost the system enters a relatively stable phase, but also can start to use a large number of inexpensive machines to support the huge amount of traffic and data, combined with this architecture and the experience of so many evolutionary processes to adopt a variety of other methods to support the increasing volume of traffic.

Look at the diagram of the system after the completion of this step:

<!--[If!vml]-->
<!--[endif]-->

This step involves these knowledge systems:

This step involves a lot of knowledge system, requires a deep understanding and mastery of communication, remote call, message mechanism and so on, the requirements are from the theory, hardware level, operating system level and the implementation of the language used have a clear understanding.

Operation and maintenance of this piece of knowledge system is also very much, in most cases need to master the distributed parallel Computing, reporting, monitoring technology and rule strategy and so on.

It is really not very laborious, the entire site architecture of the classic evolution of the process is similar to the above, of course, each step of the plan, the evolution of the steps may be different, in addition, because the site's business is different, there will be different professional and technical needs, this blog more from the perspective of architecture to explain the evolution of the process , of course, there are a lot of technology is not mentioned here, such as database cluster, data mining, search, etc., but in the real evolution of the process will also be supported by such as upgrading hardware configuration, network environment, upgrading operating system, CDN image to support the larger traffic, so in the real development process there will be a lot of differences, Another large web site to do far more than these, there are like security, operation, operations, services, storage, and so on, to do a large site is really not easy to write this article is more hope to lead to more large-scale website architecture evolution of the introduction,:).
PS: Finally, I enclose a few articles on the evolution of LiveJournal architecture:
A large-scale website performance optimization method from LiveJournal background development
http://blog.zhangjianfeng.com/article/743    
From here: http://www.danga.com/words/you can find out more about the current LiveJournal site architecture.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.