Billion-tier web system--standalone to distributed cluster

Source: Internet
Author: User
Tags connection pooling http request one table php code domain name server mysql database nginx reverse proxy

When a web system from 100,000 daily visits to 10 million, or even more than 100 million of the process, the web system will be more and more pressure, in this process, we will encounter a lot of problems. To address these performance pressures, we need to build multiple levels of caching mechanisms at the Web system architecture level. At different stages of stress, we will encounter different problems and solve them by building different services and architectures.

Web Load Balancing

Web load balancing (load balancing), simply to assign "work tasks" to our server clusters, and to use the appropriate allocation, is important for protecting Web servers in the backend.

There are a lot of strategies for load balancing, and we're talking about it from a simple perspective.

1. HTTP redirection

When the user sends a request, the Web server returns a new URL by modifying the location tag in the HTTP response header, and then the browser continues to request the new URL, which is actually page redirection. The goal of "load balancing" is achieved through redirection. For example, when we download the PHP source package, when we click on the download link, in order to solve the problem of different countries and regions download speed, it will return a download address near us. The redirected HTTP return code is 302, as shown in the following figure:

If you use PHP code to implement this functionality, here's how:

This redirection is very easy to implement and can be customized with various policies. However, it has poor performance under large-scale traffic. Moreover, the user experience is not good, the actual request is redirected, increasing the network latency.

2. Reverse Proxy load Balancing

The core work of the reverse proxy service is to forward the HTTP request, which plays the role of the browser side and the background Web server relay. Because it works in the HTTP layer (application layer), which is the seventh layer in the Network seven layer structure, it is also called "seven-tier load balancing". Can do reverse proxy software a lot, the more common one is nginx.

Nginx is a very flexible reverse proxy software, can be freely customized forwarding strategy, allocation of server traffic weights and so on. In reverse proxy, a common problem is the session data stored by the Web server, because the general load balancing strategy is randomly assigned to the request. A request from the same logged-on user cannot be guaranteed to be assigned to the same Web machine, which can cause problems with the session not being found.

There are two main types of solutions:

1. Configure the forwarding rules of the reverse proxy, so that the same user's request must fall on the same machine (by analyzing the cookie), the complex forwarding rules will consume more CPU and increase the burden of the proxy server.

2. It is recommended that information such as the session be stored exclusively on a separate service, such as Redis/memchache.

The reverse proxy service, also can open the cache, if turned on, will increase the burden of the reverse proxy, need to use with caution. This load-balancing strategy is simple to implement and deploy, and performs better. However, it has a "single point of failure" problem, if hung, will bring a lot of trouble. Moreover, by the end of the Web server continues to increase, it itself may become the bottleneck of the system.

3. IP load Balancing

The IP Load Balancer Service is working at the network layer (modify IP) and the transport layer (modify port, layer fourth), which is much higher than working on the application layer (layer seventh). The principle is that he modifies the IP address and port information of the IP layer's packets to achieve load balancing. This approach, also known as "four-layer load balancing". The common load Balancing method is LVs (Linux virtual Server,linux), which is implemented through Ipvs (IP virtual Server,ip).

When the Load Balancer server receives the client's IP packet, it modifies the IP packet's destination IP address or port, and then delivers it to the internal network intact, and the packet flows to the actual Web server. After the actual server processing is completed, the packet is then posted back to the Load Balancer server, which then modifies the destination IP address to the user's IP address and eventually returns to the client.

The above-mentioned way is called Lvs-nat, in addition, there are LVS-RD (direct routing), Lvs-tun (IP tunnel), the three are the way of LVS, but there is a certain difference, space problems, not redundant.

The performance of the IP load balancer is much higher than the Nginx reverse proxy, which only processes packets up to the transport layer, does not make further packets, and then forwards them directly to the actual server. However, it is more complex to configure and build.

4. DNS Load Balancing

DNS (domain name System) is responsible for the domain name resolution service, the domain name URL is actually the server alias, the actual mapping is an IP address, the parsing process, is the DNS to complete the domain name to IP mapping. and a domain name can be configured to correspond to multiple IPs. Therefore, DNS is also available as a load balancing service.

This load balancing strategy, simple configuration, excellent performance. However, it is not possible to define rules freely, and it is cumbersome to change the mapped IP or machine failure, and there is a problem with DNS effective delay.

5. DNS/GSLB Load Balancing

Our common CDN (Content Delivery Network) Implementation approach, in fact, is in the same domain map as a multi-IP based on the further, through the GSLB (Global Server Load Balance, Global load Balancing) maps the IP of the domain name according to the specified rules. In general, according to geographical location, will be near the user's IP back to the user, reduce the network transmission between the routing nodes of the jump consumption.

In the figure "looking up", the actual process is that Ldns (Local DNS) first obtains the root domain name server (root name server) to the top-level root name servers (for example,. com), then obtains the authoritative DNS for the specified domain name, and then obtains the actual server IP.

CDN in the Web system, in general is used to solve the large size of static resources (html/js/css/pictures, etc.) loading problems, so that these more dependent on the network download content, as far as possible from the user closer, improve the user experience.

For example, I visited a picture on the imgcache.gtimg.cn (Tencent's self-built CDN, the reason for not using the qq.com domain name is to prevent HTTP requests, with unnecessary cookie information), I get the IP is 183.60.217.90.

This approach, like the previous DNS load balancer, not only has excellent performance, but also supports configuring multiple policies. However, the construction and maintenance costs are very high. Internet first-line companies, will build their own CDN services, small and medium-sized companies generally use a third-party CDN.

The establishment and optimization of caching mechanism of web system

We have just finished the external network environment of the web system, and now we are starting to focus on the performance of our web system itself. Our web site with the increase in traffic, will encounter a lot of challenges, to solve these problems is not only the expansion of the machine so simple, the establishment and use of the appropriate caching mechanism is fundamental.

In the beginning, our web system architecture might be like this, with only 1 machines in every segment.

We start by looking at the most fundamental data storage.

First, the MySQL database internal cache use

MySQL's caching mechanism, binate from inside MySQL, the following will be the most common InnoDB storage engine-based.

1. Build the right Index

The simplest is to index, the index when the table data is relatively large, the role of fast retrieval of data, but the cost is also some. First, it takes up a certain amount of disk space, where the combined index is the most prominent and the use requires caution, and it produces an index that is even larger than the source data. Second, the data insert/update/delete after the index is established, because the original index needs to be updated, and time is increased. Of course, in fact, our system in general, is the majority of select query operations, so the use of the index still has a significant improvement in system performance.

2. Database connection thread pool cache

If each database operation request needs to create and destroy the connection, it is undoubtedly a huge overhead for the database. To reduce this type of overhead, you can configure thread_cache_size in MySQL to indicate how many threads are reserved for reuse. When the thread is not enough, it is created again, and when it is too idle, it is destroyed.

In fact, there is a more radical approach, using Pconnect (Database long connection), once the thread has been created for a long time to remain. However, this usage is likely to cause "database connections to run out" in situations where the traffic is large and the machine is more numerous, because the connection is not recycled and eventually the Max_connections (maximum number of connections) of the database is reached. Therefore, the use of long connections often requires a "connection pooling" service between CGI and MySQL, which controls the CGI machine "blind" creation of connections.

To establish the database connection pool service, there are many ways to implement PHP, I recommend using Swoole (a network communication extension of PHP) to achieve.

3. InnoDB Cache Settings (innodb_buffer_pool_size)

Innodb_buffer_pool_size This is a memory buffer for storing indexes and data, which is generally recommended as 80% of the machine's physical memory if the machine is a MySQL exclusive machine. In the scenario where the table data is taken, it can reduce disk IO. In general, the larger the value setting, the higher the cache hit rate.

4. Sub-Library/Sub-table/partition.

MySQL database table generally bear the amount of data in millions, and then upward growth, the performance will be significantly reduced, so when we anticipate that the volume of data will exceed this magnitude, we recommend the Sub-Library/table/partition and other operations. The best way to do this is to design the storage model of the sub-database at the beginning of the service, and to eliminate the risk in the middle and later stage fundamentally. However, some convenience, such as a list-type query, is sacrificed, and the complexity of maintenance is increased. However, when it comes to the amount of data tens or more, we will find that they are worthwhile.

Second, MySQL database multi-Service building

1 MySQL machines is actually a high-risk single point, because if it hangs, our Web service is unavailable. And, as the number of web-system visits continues to increase, one day, we find that 1 MySQL servers cannot be supported, and we start to need to use more MySQL machines. When multiple MySQL machines are introduced, many new problems will arise.

1. Build MySQL master-slave, from library as backup

This approach is purely to solve the "single point of failure" problem, when the main library fails, switch to the slave library. However, this practice is actually a bit of a waste of resources, since the library is actually idle.

2. mysql read/write separation, main library write, reading from library.

Two databases do read-write separation, the main library is responsible for writing the operation of the class, from the library responsible for read operations. And, if the main library fails, still does not affect the read operation, but also can be all read and write temporarily switched to from the library (need to pay attention to traffic, may be due to the traffic is too large, the library will be dragged down).

3. The master is prepared for each other.

Two MySQL from each other from the library, but also the main library. This scheme not only achieves the pressure shunt of traffic, but also solves the problem of "single point of failure". Any one failure, there is another set of services to use.

However, this scenario can only be used in two machine scenarios. If your business expands or is fast, you can choose to separate your business and establish multiple master masters.

Third, data synchronization between MySQL database machines

Whenever we solve a problem, the new problem must be born in the old solution. When we have more than one MySQL at the peak of the business, it is likely that there will be a delay of data between the two libraries. And, network and machine load, etc., can also affect the delay of data synchronization. We have come across a special scenario where daily visits are close to 100 million, and it takes many days from the library data to catch up with the main library data. In this scenario, the basic loss of utility from the library.

So, solving the synchronization problem is the point we need to focus on next.

1. mysql comes with multi-threaded synchronization

MySQL5.6 begins to support the main library and synchronize from the library data, taking multiple threads. However, the restriction is also more obvious, only in the library unit. MySQL data synchronization is through the Binlog log, the main library writes to the Binlog log operation, is sequential, especially when the SQL operation contains changes to the table structure and other operations, for subsequent SQL statement operations have an impact. Therefore, synchronizing data from a library must take a single process.

2. Self-implemented parsing binlog, multi-threaded writing.

In the database table as the unit, parse Binlog more than one table simultaneously do data synchronization. Doing so can speed up the efficiency of data synchronization, but there is also the problem of write order if there are structural relationships or data dependencies between tables and tables. This way, it can be used in some stable and relatively independent data tables.

Most of the domestic first-line internet companies are using this method to speed up data synchronization efficiency. There is a more radical approach, which is to directly parse the Binlog, ignoring the direct write in table units. However, this approach is complex, the scope of use is more limited, can only be used in some special scenarios of the database (no table structure changes, tables and tables have no data dependencies and other special tables).

Iv. creating a cache between a Web server and a database

In fact, the problem of solving large traffic volumes cannot be focused solely on the database level. According to the "28 law", 80% of the requests are focused on 20% of the hot data. Therefore, we should establish a caching mechanism between the Web server and the database. This mechanism can be used as a disk cache, or in memory cache mode. Through them, most of the hot data queries are blocked before the database.

1. Page static

When a user visits a page of a site, most of the content on the page may be unchanged for a long time. For example, a news report, once published, will almost never modify the content. In this case, the static HTML page generated by the CGI is cached locally on the Web server's disk. In addition to the first, the local disk file is returned directly to the user after querying the database for dynamic CGI.

This approach seems perfect when the web system is small. However, once the web system becomes larger, for example when I have 100 Web servers. So these disk files, there will be 100 copies, this is a waste of resources, is not good maintenance. This time someone will think, can focus on a server to save up, hehe, look at the following a way of caching, it is to do so.

2. Single Memory cache

In the example of page static, we can know that it is not good to maintain the "cache" in the Web machine, it will bring more problems (in fact, through the APC extension of PHP, can operate the Web server's native memory through Key/value). Therefore, we choose to build the memory cache service, also must be a separate service.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.