Build a highly scalable Web architecture and distributed system (II)

Last Update:2018-12-03 Source: Internet

Author: User

Tags image hosting

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Http://www.csdn.net/article/2013-01-21/2813784-Building-Scalable-Web-Architecture

Kate Matsudaira, the author of this article, is a beautiful female Vice President of engineering who once worked in top IT companies such as Sun Microsystems, Microsoft, and Amazon. She has rich working experience and team management experience, and has worked as a programmer, Project Manager, product manager, and personnel manager. Focusing on building and operating large Web applications/websites, she is currently focusing on SaaS (software as a service) applications and cloud computing (as we call it big data ).

This article describes how to build a Scalable Distributed System in aosa. We have translated the article and shared it with you in the next two distributions. In the previous article "practice of building highly scalable Web architectures and distributed systems", we gave an example of the core elements to consider when designing a distributed system: availability, performance, reliability, scalability, ease of management, and cost. In this article, we will introduce how to design Scalable Data access, including Server Load balancer, proxy, global cache, and distributed cache.

Build fast and Scalable Data Access Blocks

After discussing the core considerations for designing a distributed system, let's discuss the difficult part: Scalable Data Access.

Most Simple Web applications, such as LAMP stack applications, look like 5:

Figure 5: Simple Web Applications

As the system expands, they will face two major challenges: building scalable application servers and data access mechanisms. In a high-scalability application design, typically minimized application (or web) services often reflect a non-shared architecture. This requires horizontal scaling of the application server layer. The result of this design is to transfer the heavy work to the Database Service and Configuration Service at the lower layer of the stack, this is the real scalability and performance challenge at this layer.

The rest of this article focuses on some common policies and methods to make these services fast and scalable, improving data access speed.

Figure 6 over-simplified Web Applications

Most systems may simplify graph 6, which is a very good start point. If you have a lot of data and want to access it quickly and conveniently, it's like a pile of candy on the top of your desk drawer. Though simplified, it implies two challenges: scalable storage and fast data access.

For this example, let us assume that there are many too byte (TB) data, and allow users to randomly access a small part of the data (see figure 7 ). This is very similar to locating an image file on the file server in the image app in this article.

Figure 7 access specific data

This is also a big challenge. It is expensive to load terabytes of data into the memory, which can be directly converted to the disk for I/O. Reading from a disk is much slower than the memory. Memory Access is as fast as Chuck Norris, while disk access is slower than DMV. This speed is different from the sum of big datasets. Real memory access is about 6 times faster than sequential access, or 0.1 million times faster than random read from the disk (see the pathologies
Of big data ). In addition, even the unique ID is a tough task to find the exact location in a small amount of data.

Fortunately, we can find many ways to make this problem simple. Here we provide four very important methods: cache, proxy, index, and Load balancer. We will discuss these four topics in detail below to speed up data access.

Cache

Cache is based on the local reference principle: When the CPU needs to read a data, it first looks for it from the cache and then immediately reads it and sends it to the CPU for processing; not found, it reads data from the memory at a relatively slow rate and sends it to the CPU for processing. At the same time, it transfers the data block of the data to the cache, this allows you to read the entire block of data from the cache without calling the memory. They are used on almost every computing layer: hardware, operating systems, Web browsers, web applications, etc. A cache is equivalent to a temporary memory: it has a limited amount of space, but it is faster than accessing raw data. The cache can also exist in the architecture of each layer, but it is often found on the layer closest to the front-end, where data can be quickly implemented and returned without occupying downstream layer data.

So how can we use cache in our API examples to make data access faster? In this case, the cache can be inserted in many places. Insert cache on the request layer node, as shown in figure 8.

Figure 8 insert cache at the request layer node

Place a cache on the request layer node to respond to local storage data. When sending a request to the server, if the requested data exists locally, the node will return the local cache data quickly. If the local disk does not exist, the request node queries data on the disk. The request layer node cache can be stored in the memory (this is very fast) or on the local disk of the node (faster than accessing network storage ).

Figure 9 multiple caches

What happens when it is expanded to many nodes? 9. If the request layer is expanded to multiple nodes, it is still possible to access the host cache where each node is located. However, if your server Load balancer randomly distributes requests between nodes, the requests will access different nodes, so the cache omission will increase. There are two ways to overcome this problem: Global cache and distributed cache.

Global Cache

Global cache means that all nodes use the same cache space. This includes adding a server or some type of file storage. All request Layer Nodes access the storage faster than the original storage. Each request node queries the cache in the same way. This cache scheme may be a bit complicated. As the number of clients and requests increases, a single cache may easily overflow, but it is very effective in some structures (especially those specific hardware, specifically used to increase the global cache speed, or specific datasets that need to be cached ).

Figure 10 describes two common global cache methods. When a cache response is not found in the cache, the cache itself retrieves the missing data from the underlying storage. As shown in figure 11, the request node searches for data not found in the cache.

Figure 10 Global cache for data retrieval

Figure 11 request nodes in the global Cache

Most applications that use global cache tend to use the first type. The cache itself is used to evict and obtain data to prevent a large number of requests from the same data zone of the client. However, in some cases, the second implementation is more meaningful. For example, if the cache is used to store a large number of files, a low cache hit rate may cause high-speed buffer overload and cache omission. In this case, it helps to have a large percentage of the total data set (or hot data set) in the cache.

Distributed cache

Distributed cache is the cache data cached in the memory of each node in the distributed system. As shown in figure 12, each node has its own cache data, so if the refrigerator plays the role of a cache grocery, therefore, the distributed cache stores food in different places-refrigerators, cupboards, and lunch boxes-when requested, it is convenient to take whatever you choose, without having to go to the store specially. Normally, consistent hash functions are used to divide the cache. For example, a request node is looking for data of a specific block in the distributed cache, it will soon know where to find and make sure the data is available. In this case, each node will have a small cache and then send data requests to another node. Therefore, one of the advantages of distributed cache is that you only need to add nodes to the request pool to increase the cache space and reduce the access load to the database.

Of course, the distributed cache also has shortcomings. For example, if the node fails or goes down, the data stored by the node will be lost.

Figure 12 distributed cache

The outstanding advantage of distributed cache is to improve the running speed (the premise is correct ). Different methods have different effects. If the method is correct, the speed will not be affected even if the number of requests is large. However, the maintenance of the cache requires additional storage space, which usually requires memory implementation, but the price is very high.

One of the most popular open-source cache products is memcached (that is, it can work either locally or in distributed cache). However, there are many other options (including many languages-or frameworks-specific options ).

Memcached is used for many large web sites and is very powerful. Memcached optimizes data storage and implements fast search (O (1) based on a hashmap that stores key/value pairs )).

Facebook uses different types of caching technology to improve the performance of their websites (refer to "Facebook caching and performance "). $ Globals and APC are used at the language level (function calls are provided in PHP), which helps to make intermediate function calls faster (these types of libraries are used in most languages to improve website page performance ). Facebook uses global cache and distributes the cache across multiple servers (refer to "Scaling"
Memcached at Facebook), which allows them to get better performance and throughput by configuring user file data, and to have a central location to update data (this is very important, when running thousands of servers, the cache effectiveness and consistency maintenance are very big challenges ).

Let's talk about what we should do when data is not cached ......

Proxy

To put it simply, the proxy server is the middleware of hardware/software. It accepts client requests and forwards them to the back-end source server. Usually, the proxy server is used to filter requests, record request logs, or sometimes convert requests (by adding/deleting header nodes, encryption/decryption, or compression ).

Figure 13 Proxy Server

The proxy can optimize requests and optimize request traffic from the perspective of the entire system. On the one hand, using a proxy can accelerate data access and compress identical (or similar) requests into one request, and then return a single result to the request client, this is collapsed forwarding ).

In a LAN proxy, for example, clients do not need to use their own IP addresses to connect to the Internet. For the same content, the LAN will compress requests from clients. It is easy to cause confusion, because many Proxies are also cache (it is a very logical place for caching), but not all caches play the role of proxy.

Figure 14 use a proxy server to compress requests

Another great way to use the proxy server is to encrypt space data by compressing requests. Using this policy to maximize data localization requests can reduce request latency. For example, there is a large string of node requests, such as partb1 and partb2. We can set a proxy to identify personal space location requests, compress them into a single request and only return BIGB, which greatly reduces the number of data reads from the data source (as shown in Figure 15 ). In the case of high load, the proxy is also particularly useful, or when you have a limited cache, they can basically process multiple request batches into one.

Figure 15 use a proxy to compress space data requests

If you are looking for proxies for the system, here are a few for you to choose from: squid and varnish, both of which have been thoroughly tested and widely used on many large websites. These proxy solutions provide many optimization solutions for client-server-side communication. Installing the Web server layer as a reverse proxy can greatly improve the web service performance and reduce the workload required to process incoming client requests.

Index

Using indexes to quickly access and optimize data is a well-known policy. The most famous is database indexes.

Figure 16 Index

An index is the directory of the database table, the table data and the list of corresponding storage locations. For example, an article directory can speed up data tables. For example, let's look for a piece of data, the second part of B -- how to find its location? If you store an index, such as data a, data B, and data C, it will tell you the original location of Data B. Then you only need to view B and read B's data as needed (see figure 16 ).

These indexes are usually stored in memory or locally transmitted to the client request. Berkeley DBS (bdbs) and tree data structures are often used to store data in ordered lists, which is an ideal choice for accessing indexes.

In general, many layer indexes are used as a ing, moving from one location to the next, and so on until you get the desired data block (see figure 17 ).

Figure 17 multi-layer index

Indexes can also create multiple views for the same data. For large datasets, this method is very good. You can define different filtering and sorting methods without creating multiple additional data copies,

For example, early image hosting systems are actually hosting Image book content, allowing the client to query the content of these images and enter a topic to search all relevant content. In addition, in the same way, the search engine allows you to search for HTML content. In this case, a lot of servers are required to store these files, and it may be difficult to find a page. First, you need to easily access any word or word ancestor through reverse indexing. Then, you need to navigate to the correct page and location and retrieve the correct image results is also a challenge. Therefore, in this case, the reverse index will be mapped to a location (such as book B), and then book B may have an index containing all the content, location, and number of occurrences of each part.

This intermediate-level index only contains information about words, location, and book B. Compared to all the information that has to be stored in a large reverse index, this nested index architecture allows each index to occupy less space. In large systems, this is critical. Even with compression, these indexes also occupy a very high storage space.

For example, let's assume that there are 100,000,000 books in the world (refer to the official blog of Inside Google Books)-each book has only 10 pages and each page has only 250 words, this means there are 250 billion words. If each word has only 5 bytes, each byte occupies 8 bits (OR 1 byte, or even some characters occupy 2 bytes ),
Bytes/Word, the words contained in an index may be stored more than one TB. In addition, the index may contain other information, such as the original word and data location.

It is very important to be able to quickly and easily find data. Using indexes can be simple and efficient.

Server Load balancer

Another key part of a distributed system is load balancing. Server Load balancer is almost a major component of each architecture. Its role is to distribute network requests to available servers in a server cluster, by managing incoming web data traffic and increasing effective network bandwidth, network visitors can obtain the best possible online experience of hardware devices.

Figure 18 Server Load balancer

There are many algorithms used to provide services for requests, including randomly selecting a node, loop, or even selecting a node based on a specific standard, such as memory or CPU utilization. The server Load balancer can be displayed in hardware or software. Haproxy is an open-source Load balancer and is widely used.

In a distributed system, the Server Load balancer is usually at the front-end of the system, and all incoming requests are routed accordingly. In a complex distributed system, it is not common for a request to be routed to multiple load balancers, as shown in 19:

Figure 19 multiple load balancers

Like a proxy, some server load balancers can route requests to different Server clusters. (Technically, this is also called reverse proxy .)

One of the challenges facing Server Load balancer is to manage user-specific session data. On an e-commerce website, when you only have one client, it is very easy for users to put the goods into the shopping cart and continue to access (this is very important because the goods are likely to continue to be sold, when the user exits, the product is still in the shopping cart ). However, if a user routes a node in this session, a different node will be routed during his next visit, which may make the items in the shopping cart inconsistent, because the new node may lose the original items in the user's shopping cart (when you put 6 packs of mountain dew in the shopping cart, wait until you log on again and find that the shopping cart is empty ). One way to solve this problem is to use sticky
Sessions is used to route users to the same node, but it is difficult to use the reliability function, such as automatic failover ). In this case, the user's shopping cart will always have items, but if the sticky node becomes unavailable, this requires special cases and assumes that the items in the shopping cart are no longer valid (although it is expected that such assumptions will not be built into the application ). Of course, there are many other methods to solve this problem, such as the services mentioned in this article and not including (browser cache, cookies, and URL rewriting ).

In a large system, there are various types of scheduling and load balancing algorithms, including simple points such as random selection or loops and more complex mechanisms, such as utilization and capacity. All these algorithms can distribute traffic and requests and provide useful reliability tools, such as automatic failover or automatic elimination of a bad node (for example, when it cannot respond ). However, this advanced feature will complicate problem diagnosis. For example, in the case of high load, the Server Load balancer will remove slow or time-out nodes (because there are too many requests, requests will be distributed to other nodes after the node is deleted ), this will undoubtedly increase the workload of other nodes, that is, the load will increase. In this case, a large number of monitoring changes are very important, because the overall system traffic and throughput may seem to be reduced (because the node service has fewer requests ), however, individual nodes may be exhausted (processing more requests ).

Server Load balancer is also a simple way to expand the system capacity. As mentioned in the article, other technologies play a very important role in distributed system architecture. The server Load balancer also provides some important functions to test the health status of a node. For example, if the node is slow or overloaded, it may be deleted and then redundant by different nodes in the system.

Queue

So far, we have discussed many methods to speed up data reading, but another important part of the extended data layer is how to efficiently write data. In a simple system, the process load is relatively small, and the database is relatively small, there is no doubt that the write speed is certainly not slow. However, in large and complex systems, this speed is hard to grasp and may take a long time. For example, data may have to be written to several different places. Different servers, indexes, or systems are under high load. In this case, where should I write? Or any other task may take a long time. To achieve performance and availability in the system, you need to build Asynchronization. A common method for processing this asynchronous mode is queue.

Figure 20 synchronization request

Imagine that in a system, every client sends a request to a remote server, the server should receive and complete the task as quickly as possible, and then return the result to the corresponding client. In a small system, a server (or Logic Server) transmits client data as quickly as the client sends data, which is perfect. However, when the server receives excessive requests, each client must wait in queue for the server to process other client requests until it is your turn to process your requests, until the final completion. This is an example of a synchronous request, as shown in Figure 20.

This kind of synchronization behavior will seriously reduce the client performance, and the client is forced to wait, but adding additional servers to meet the load does not solve the problem, even with the most effective Server Load balancer, it is difficult to ensure fair allocation. Further, if the server processing the request is unavailable or paralyzed, the client upstream will also fail. To effectively solve this problem, you need to abstract the actual work of client requests and service requests.

Figure 21 use a queue to manage requests

The queue is as simple as it sounds. When a task comes in, it is added to the queue, and then the workers selects the next task that can be processed. (See figure 21) these tasks may be simple or complex, for example, preview the image generated by the document. When a client submits a task request to a queue, they do not need to be forced to wait for the result. Instead, they only need to confirm whether the request is correctly received.

The queue enables the client to work in asynchronous mode and provides strategic abstraction for client requests and responses. On the other hand, in a synchronization system, requests and responses are not differentiated, so they cannot be managed separately. In an asynchronous system, the client sends a request task. The server responds to the received message and confirms that the task is received. Then, the client can regularly check the task status. Once the task is completed, the result is displayed. When the client waits for the completion of asynchronous requests, it can also execute other tasks freely, or even send asynchronous requests to other servers. The following describes the leverage of messages and queues in distributed systems.

The queue also provides a protection mechanism for service interruption or failure. For example, it is easy to create a highly robust queue. When the server fails instantly, the queue can send the failed requests to the server again. Compared with directly exposing the client to interrupt the service supply, it is more desirable to use the queue to ensure the quality of service, and complicated and contradictory client error handling is required.

Queue is the basis for managing distributed communication and all parts of any large-scale distributed system, and there are many implementation methods. There are many open-source queues, such as rabbitmq, activemq, and beanstalkd, but some are also used as services, such as zookeeper and even for data storage, such as redis.

Summary

Designing an efficient distributed system is exciting, especially with fast data access speed. This article only discusses several practical examples, hoping to help you.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More