Big-to-consumer website high Performance Scalable Architecture Technology Quest 2010-07-21 08:51 Wild unruly javaeye font size:T|T
Introduces you to high-performance Web site architecture technologies for large-to-consumer websites, including cache usage, application and database splitting, asynchronous communication, and unstructured data storage.
AD:WOT2014 Course Recommendation: Actual combat MSA: using open source software to build micro-service system
The technical composition of a large SNS site is described in the article "the world's largest PHP site, Facebook backstage technology Quest." Today we continue to explore the large-scale web site, one to explore the large-scale website architecture technology. As the largest business-to-consumer website, its website architecture has been carrying the high speed of data growth pressure, to ensure good load and process experience, a scalable high-performance site architecture is essential.
First, the application of the stateless
The scalability of a system depends on how the state of the application is managed. Imagine if we saved a lot of state information with the client in the session, what do we do when the server that holds the state information goes down? Generally speaking, we all solve this problem through the cluster, and the commonly said cluster, not only load balance, more important is to have the failure recovery failover, such as Tomcat using the cluster node broadcast replication, JBoss used by the pairing replication session state replication Strategy, However, the state recovery in the cluster has its drawbacks, that is, it seriously affects the scalability of the system, the system can not be increased by adding more machines to achieve good horizontal scaling, because the session between the cluster nodes will increase the traffic overhead, so to achieve the scalability of the application itself, We need to ensure the stateless nature of the application, so that the nodes in the cluster are the same, so that the system is better horizontal scaling.
It says that the importance of stateless, then how to achieve the state of statelessness? At this point, a session frame will work. Generally through the implementation of cookies, or you can also use centralized session management to complete, said that the specific point is a number of stateless application nodes connected to a session server, session server to save the session to the cache, The session server backend is then equipped with the underlying persistent data sources, such as databases, file systems, and so on.
Ii. Efficient Use of caches
The brothers in Internet applications should be aware of how important caching is for an Internet application, from browser caches, reverse proxy caches, page caches, local page caches, object caches, and so on, to the cache application scenario.
Generally speaking, the cache can be divided into the local cache and the remote cache depending on the degree of proximity to the application. In a general system, either the local cache or the remote cache is used, and the data consistency of the local cache and remote cache becomes much more cumbersome.
In most cases, the cache we're talking about is read caching, and there's another type of cache: write Cache. For some data that is not high in reading and writing, and the data security requirements are not high, we can cache it to reduce access to the underlying database, such as statistics on the number of access to the product, statistics API calls, etc., can be written in the memory cache and then deferred persisted to the database, This can greatly reduce the write pressure on the database.
Third, the application of Split
First, before explaining the application split, let's review some of the problems that a system encounters when it gets bigger, and we'll find out how splitting is important for building a large system.
Early on the system, the number of users is not many, all the logic may be placed in a system, all logic to run to a process or an application, this time because less users, system access is low, so all the logic is put in an application can not be. However, brothers are aware that, as the system users continue to increase, system access pressure more and more, at the same time with the system development, in order to meet the needs of users, the original system needs to add new functions come in, the system becomes more and more complex, we will find the system becomes more and more difficult to maintain, difficult to expand, At the same time, the scalability and availability of the system can also be affected. So how do we solve these problems at this time? The sensible Way is to split (which is also a decoupling), we need to the original system according to certain standards, such as business relevance, divided into different subsystems, different systems responsible for different functions, so after the segmentation, we can expand and maintain a separate subsystem, so as to improve the system scalability and maintainability, At the same time, our system of horizontal scale scaling out greatly improved, because we can be targeted to the pressure of the sub-system to scale horizontally without affecting the other subsystems, and not as before, every time the system pressure becomes larger, we need to scale the entire system, And this cost is relatively large, in addition, the coupling between subsystems and subsystems is reduced, when a subsystem is temporarily unavailable, the overall system is still available, thus the overall system availability has been greatly enhanced.
Therefore, a large-scale Internet application, must be split, because only the split, the system's extensibility, maintainability, scalability, usability will become better. But the split also brings problems to the system, that is, how to communicate between subsystems, and the specific means of communication what? Generally have synchronous communication and asynchronous communication, here we first to the next synchronous communication, the following topic "message system" will talk about asynchronous communication. Since the need for communication, this time a high-performance remote calling framework seems to be very general.
All of the above is the benefits of splitting, but the inevitable will also bring new problems, in addition to the subsystem of communication problems, the most noteworthy problem is the system between the dependencies, because the system is more, the system will become more complex dependencies, at this time need to pay more attention to the split standard, For example, some dependent systems can be vertical, so that the functions of the system as far as possible perpendicular, which is the current company is doing the system vertical, but also must pay attention to the cycle of dependency between the system, if there is a circular dependence must be careful, because this may cause the system chain start failure.
As can be seen from the above, a large system to be maintainable, extensible, scalable, we have to split it, splitting will inevitably bring about the system between the communication and management of dependencies between the system.
Iv. Database splitting
In the previous "Apply Split" topic, we mentioned that a large Internet application requires good splitting, where we just say "application-level" split, but in fact our Internet application in addition to application-level split, there is another very important aspect is how the storage split. So the topic is mainly about how to split the storage system, often referred to as the RDBMS.
Having identified the subject of this section, we review some of the problems encountered in the process of growing an Internet application from an early age, and the importance of splitting the RDBMS through the problems encountered.
The system just started, because the system just on-line, the user is not many, at that time, all the data are placed in the same database, this time because the user less pressure, a database can be fully dealt with, but with the operation of those buddies hard cry and desperate promotion, suddenly one day found, oh, God, the number of users suddenly become more up, followed by the database this dude can't stand, it finally in one day everyone and comfortable time hanging off. At this time, we make the technology of the brothers, to see what is the reason, we checked the search, found that the database read the pressure is too big, at this time we are all clear is to read and write separation, when we will configure a server for the master node, and then with a few salve nodes, In this way, through the read and write separation, so that the pressure to read the data allocated to different salve nodes above, the system finally resumed normal, and began to run normally. But the good is still not long, one day we found that master this guy can't hold up, it load old high, sweating, at any time there is the risk of warping off, this time we need to vertical division (that is, the so-called sub-Library), such as the commodity information, user information, transaction information stored in different databases, At the same time can also be used for commodity information Library Master,salve mode, OK, through the sub-Library, each according to the function of the database write pressure is divided into different servers, so that the pressure of the database has finally returned to normal state. But is that so, that we can have peace of mind? No, this no, not what I said, is the predecessors through experience summed up, with the increasing number of users, you will find that some of the system's tables will become unusually large, such as a friend relationship table, store parameter configuration table, this time whether it is written or read the data of these tables, It's a very exhausting task for the database, so we need to do "horizontal partitioning" at this point (as the saying goes, or sharding).
It says a lot, nothing more than to tell you a fact that "database is the most difficult level in the system," a large Internet application will inevitably go through a single DB server, to the Master/salve, and then to the vertical partition (sub-Library), and then to the horizontal partition (table, sharding) process, and in this process, master/salve and vertical partitioning is relatively easy, the impact on the application is not very large, but the table will cause some tricky problems, such as can not cross multiple partitions join query data, how to balance the load of each shards, etc. At this point, a generic DAL framework is needed to mask the impact of the underlying data store on the application logic, making the access to the underlying data transparent to the application.
V. Asynchronous communication
In the "Remote Call framework" Introduction, we said that a large-scale system for scalability and scalability needs, it is necessary to split, but after splitting, how the communication between subsystems becomes our first problem, in the "Remote Call framework" subsection, We talked about the application of synchronous communication in a large distributed system, so we'll talk about asynchronous communication in this section. Well, since the introduction of asynchronous communication, then the "message middleware" is coming, the use of asynchronous communication is also related to the scalability of the system, and maximize the decoupling of the various subsystems.
When it comes to asynchronous communication, one of the things we need to pay attention to is that the async here must be based on the business characteristics, it must be for the business of asynchronous, usually suitable for the asynchronous situation is some loosely coupled communication occasions, and for its own business relationship between the larger business systems, we still have to use synchronous communication comparison.
OK, so the next step is to say what benefits async can bring to the system. First we think, if the system has a and B two subsystems, if A and B is synchronous communication, then to make the overall scalability of the system must be both A and b scaling, which affects the entire system scale out. Second, the synchronous call will also affect the availability, from the point of view of mathematical reasoning, a synchronous call B, if a is available, then B is available, the inverse of the proposition is if B is not available, then A is not available, which will greatly affect the system availability, again, the system can greatly improve the system response time after the asynchronous communication, The response time of each request is shortened to improve the user experience, so asynchronous improves the scalability and usability of the system, and greatly enhances the response time of the request (of course, the overall processing time of the request may not be reduced).
Vi. Unstructured data storage
In a large-scale Internet application, we will find that not all of the data is structured, such as some configuration files, a user-corresponding dynamic, and a snapshot of a transaction, and other information, which is generally not suitable for storage in the RDBMS, they are more consistent with a key-value structure, There is also a kind of data, the amount of data is very large, but the real-time requirements are not high, at this time the data also need to be stored by another means of storage, and other static files, such as pictures of various products, product descriptions and other information, because of the larger, into the RDBMS will cause read performance problems, This can affect other data read performance, so this information also needs to be stored separately from other information, and the general Internet Application System will choose to save this information to the Distributed File system.
With the development of the Internet, the industry has gradually become popular from the second half of 08 a concept is NoSQL. We all know that according to Cap theory, consistency, availability and partition fault tolerance 3 can not be satisfied at the same time, up to two at the same time, our traditional relational data using ACID transaction strategy, and ACID transaction strategy is more emphasis on a high consistency and reduce the need for availability, However, Internet applications tend to be slightly more demanding than consistency, and this time we need to avoid using the ACID transaction strategy of data, instead of the base transaction policy, base transaction policy is the abbreviation of basic availability, transaction soft state and final consistency, through the base transaction policy, We can improve the usability of the system through eventual consistency, which is the strategy used by many NoSQL products today, including Facebook's Cassandra,apache Hbase,google bigtable, which are ideal for unstructured data such as the Key-value form of data storage, and these products have a good advantage is the level of scalability. At present, the company is also studying and using some mature nosql products.
Seven surveillance, early warning system
For large systems, the only thing that is reliable is that all parts of the system are unreliable.
Because a large-scale distributed system is bound to involve a variety of devices, such as network switches, ordinary PC, various types of network cards, hard disks, memory, and so on, and these things are in a very large number of times, the probability of error will become larger, so we need to monitor the state of the system, And the monitoring also has the granularity of the thickness of the point, the size of the coarse, we need to monitor the entire application system, such as the current system network traffic is how much, memory utilization is how much, io,cpu load is how much, the service's access pressure is how much service response time is such a series of monitoring, And fine granularity, we need for example in the application of a function, a URL of the number of visits, the number of PV per page, the page occupied by the amount of bandwidth per day, the page rendering time is how much, static resources than the daily consumption of the bandwidth is how much more granular monitoring. Therefore, a monitoring system becomes essential.
Said before the importance of a monitoring system, with the monitoring system, more important is to be combined with the early warning system, such as when a page to increase the amount of time, the system can automatically alert, a server CPU and memory consumption suddenly become large, the system can automatically alert, When the concurrent request is lost seriously, the system can be automatically alerted and so on, so that through the combination of monitoring system and early warning system can enable us to quickly respond to problems in the system, improve the stability and availability of the system.
Eight, unified management of configuration
A large distributed application, usually have a lot of nodes, if each time a new node to change the configuration of the other nodes, or every time a node is deleted to change the configuration, it is not conducive to the maintenance and management of the system, but also more prone to introduce errors. In addition, many of the clusters in the configuration of many systems are the same, if not unified configuration management, you need to maintain a configuration on all systems, which will cause the management of the configuration is cumbersome, and through a unified configuration management can make these problems are well resolved, When a new node is added or deleted, the configuration management system can notify each node to update the configuration to achieve the configuration consistency of all nodes, which is both convenient and error-prone.