Extensible Web architecture and distributed systems

Source: Internet
Author: User
Tags failover naming convention unique id hosting image hosting

Open source has become the basic principle of some large web sites. And as these sites grow, some of the best practices and rules appear in their structure. The purpose of this paper is to introduce some key problems in the design of large web sites and the basic work to achieve them.

This article focuses on the introduction of network systems, although some of the guidelines are also applicable in other distributed systems.

1.1. Design principles for Web distributed systems

What does it mean to build and operate a scalable Web site or application? At its original level, this is simply a user connecting to a remote resource over the Internet-the part that makes the system scalable is the distribution of resources, or access to resources, across multiple servers.

Like most things in life, it can be helpful in the long run to take time to plan ahead when building a Web service; Understanding some considerations and the trade-offs behind big websites make smarter decisions when creating small sites. Here are some key principles that affect the design of large-scale web systems:

    • Availability: For many companies, the uptime of a website is a critical reputation and feature, like some large online retail systems, where even a minute of downtime can result in a loss of thousands of or millions of of dollars, Therefore, it is a basic business and a technical requirement to design the system's error-handling mechanism of always availability and elasticity. Highly available distributed systems require careful consideration of the redundancy of critical components, which can be quickly repaired after a system failure, and gracefully degraded when problems arise.
    • Performance: The performance of the Web site is becoming an important aspect of most sites, and the speed of the site affects the normal use and user satisfaction, as well as the ranking of the search, which is a factor that affects the revenue of the site and the retention of users. Therefore, creating a fast-response and low-latency system is critical.
    • Reliability: A system needs to be reliable, always returning the same data response than a request like a data. If the data changes or is updated, the same data will return a new data. Users need to know that something is written to the system or stored in the system, and the system remains intact and can be restored to a suitable location later.
    • Scalability: When it comes to any large distributed system, size is only one aspect of consideration, and it is equally important to enhance efforts to handle the larger load performance, which is often referred to as the scalability of the system. Scalability can represent many different parameters of the system: the amount of extra traffic, the ability to add storage capacity, and even the amount of transaction processing.
    • Manageability:   Designing a system that is easy to operate is another important consideration, and the manageability of the system is equivalent to the scalability of the operation: maintenance and upgrade. Manageability needs to be considered when the problem occurs, easy to diagnose and understand the problem, ease of upgrading and modification, and the system can be simple operation (that is, the routine operation has no failures and exceptions?).
    • Cost:   Cost is an important factor. It is clear that this includes hardware and software costs, but it is also important to consider other aspects of the cost of deploying and maintaining the system. The amount of time developers build systems, operations deployment time, and even training time need to be considered, cost is the total cost.

Each of these principles provides a fundamental decision for designing a distributed Web architecture. However, they can also be mutually exclusive, for example, to achieve a goal at the expense of another. A basic example: choose to increase the address capacity by simply adding more servers (scalability), at the expense of manageability (the server you have to operate on) and cost (the price of the server).

When designing any Web application, it is important to consider these key principles, even if you have to admit that a design might sacrifice one or more of them.

1.2. Basic

When designing a system architecture, there are some things to consider: what is the right part, how to get the parts together well, and what the good compromise is. It is often not a smart business choice to invest in the scalability of a system before it needs to be architected, however, the foresight in design can save a lot of time and resources in the future.

This part of the focus is some of the core elements of almost all large Web application centers: Service, redundancy, partitioning, and error handling. Each of these factors contains choices and compromises, especially the design principles mentioned in the previous section. In order to parse these in detail, it is best to start with an example.

Example: Image hosting App

Sometimes you may upload an image online. For sites hosting and distributing a large number of images, it is a challenge to build a site architecture that is both cost effective and low latency (you can get pictures quickly).

Let's assume a system in which users can upload their images to a central server, which in turn allows some Web links or APIs to get these images, just like Flickr or Picasa today. To simplify the need, let's assume that the application is divided into two main parts: one is the ability to upload images to the server (usually written), and the other is the ability to query a picture. However, we certainly want to upload the function very efficiently, but we are more concerned about the ability to quickly distribute, that is, when a person requests a picture (for example, a Web page or other applications request a picture) can be quickly satisfied. This ability to distribute is much like a Web server or CDN Connection server (a CDN server is typically used to store content in multiple locations while the content can be accessed from geographically or physically closer to its users, which has achieved efficient access).

Other important aspects of the system:

    • There is no limit to the number of pictures stored, so the storage needs to be extensible and needs to be considered in terms of the number of images.
    • Image downloads and requests do not require a low latency.
    • If the user uploads a picture, the picture should be there (the reliability of the image data).
    • The system should be easy to manage (manageability).
    • Because the picture host does not have the high profit space, therefore the system needs to be cost-effective.

Figure 1.1 is a simplified function diagram.

Figure 1.1: Simplified architecture for picture host applications

In the example of this picture host, the system must be fast to meet, its data storage to be reliable, and all of these attributes should be highly extensible. Building a small version of the application is not very important and easy to deploy on a single server; however, this is not an interesting part of this section. Let's say we want to build something that will grow to the size of the Flickr pain.

Service

When it comes to designing an extensible system, it is helpful to decouple the functionality and consider the services of each part of the system to define a clear interface. In practice, system design in this way is becoming a service Oriented Architecture (SOA). For this type of system, each service has its own independent method context and interacts with anything external to the context using an abstract interface, typically a public API for other services.

A system is deconstructed into a number of complementary services that can be decoupled from the operations of other parts. This abstraction helps to establish a clear relationship between these services, its underlying environment, and the consumers of the service. Creating this clear profile can help isolate the problem, but it also allows the modules to scale independently from the rest. This type of service-oriented design system is very similar to object-oriented design programming.

In our example, requests to upload and retrieve images are handled by the same server, however, because the system needs to be scalable, there is reason to break down these two functions into their own services for processing.

Fast Forwarding (fast-forward) assumes that the service is in great use, and in this case it is easy to see how much of the time spent reading an image is affected by the write operation (because these two features compete to use the resources they share). Depending on the architecture used, this impact can be huge. Even if uploads and downloads are exactly the same (not in most IP networks, most download speeds and upload speeds are designed to be at least 3:1), file reads are typically done from the cache, while write operations have to perform the final disk operation (and May Write a few times to reach a final agreement). Even if everything is in memory or read from a disk (such as an SSD disk), the database write operation is almost always slower than the read operation. (Pole position is an open-source DB benchmark tool, http://polepos.org/, test results see http://polepos.sourceforge.net/results/ Polepositionclientserver.pdf)

Another potential problem with this design is that the Web server, like Apache or lighttpd, usually has a maximum number of concurrent connections that can be maintained (by default around 500, but higher) and maximum traffic, and they are consumed quickly by write operations. Because read operations can be performed asynchronously, or with some other performance optimizations like GIZP compression or block transfer encoding, the Web server can request more requests than the maximum number of connections by switching between multiple request services (the maximum number of connections for an Apache is set to 500. It is also normal to provide a nearly thousand-read request service per second. Write operation is different, it needs to remain connected during the upload process, so in most home network environment, uploading a 1MB file may take more than 1 seconds, so the Web server can only handle 500 such concurrent write operations requests.

For this bottleneck, a good planning case is to separate the read and write images into two separate services, figure 1.2. This allows us to expand any one of them individually (since it is possible that our read operations are much more frequent than write operations) and also helps us clarify what each node is doing. Finally, this also avoids future worries, which makes troubleshooting and finding problems easier, like slow-reading problems.

The advantage of this approach is that we can solve the problem of each module individually-we don't have to worry about writing and retrieving new pictures in the same context. Both services still use pictures from the global repository, but they can freely optimize their own performance through the appropriate service interfaces (for example, request queues, or cache hotspot images-optimizations above). From a maintenance and cost perspective, it is useful to plan independently on a per-scale basis for each service, just imagine that if they are all grouped together, one unintentionally affects performance and the other is affected.

Of course, the above example works well when you use two different endpoints (in fact, this is very similar to cloud storage and content distribution networks). Although there are many ways to solve such bottlenecks, each has its own trade-offs.

For example, Flickr solves this type of read/write problem by assigning users access to different shards, each of which can handle only a certain number of users, adding more shards to the cluster as the user adds (see the description of "Flickr Microcosm"/http// mysqldba.blogspot.com/2008/04/mysql-uc-2007-presentation-file.html). In the first example, it is easier to plan the hardware resources (the ratio of reads and writes across the system) for practical purposes, however, the Flickr plan is based on the user base (assuming that each user has the same resource space). In the former, a failure or a problem can cause the overall system function to decline (for example, all can not write to the file), but Flickr a shard fault will only affect the relevant part of the user. In the first example, it is easier to manipulate the entire data set-for example, to update the write service on all image metadata to contain new metadata or retrieve-yet every shard on the Flickr architecture You need to perform an update or retrieve (or you need to create an Indexing service to check the metadata-find out which one is the actual result).

Redundancy (redundancy)

To gracefully handle a failure, the Web schema must be redundant with its services and data. For example, if a single server only has a single file, the loss of the file means that it is lost forever. Losing data is a bad thing, and the common approach is to create multiple or redundant backups.

The same principle applies to services as well. If the app has a core feature, make sure it runs multiple backups at the same time or that the version is safe against a single point of failure.

Creating redundancy in the system eliminates a single point of failure and provides backup functionality at an emergency time. For example, if two instances of the service are running concurrently in one product, when one fails or is degraded (degrade), the system can be transferred (failover) to the good backup. Failover (Failover) can be performed automatically or manually by manual intervention.

Another key part of service redundancy is the creation of a no-share (shared-nothing) architecture. With this architecture, each contact can operate independently, without a central "brain" management state or coordination activity. This can greatly improve scalability (scalability) because new contacts can be added at any time without the need for special conditions or knowledge. And more importantly, the system does not have a single point of failure. So you can better deal with the fault.

For example, in our Image Service application, all pictures should be redundant on another hardware (ideally, in different geographical locations, in case of a major disaster in the data center, such as earthquakes, fires), and access to the image services (see Figure 1.3. )-including all potential service requests-should also be redundant. (Load balancer is a good way to redundant services, but the following approach is more than just load balancing)

Figure 1.3: Using redundant picture storage

Partition

We may meet a large data set that cannot be stored by a single server. You may also encounter an operation that requires too many compute resources, resulting in degraded performance and an urgent need to add capacity. In these cases, you have two options: horizontal or vertical expansion.

Vertical scaling means adding more resources to a single server. For a very large data set, this could mean adding more (or larger) hard disks to a single server to hold the entire data set. For compute operations, this could mean moving operations to a server that has a faster CPU or larger memory. In either case, the scale-up is intended to enable a single server to handle more methods on its own.

On the other hand, for scale-out, you are adding more nodes. For example, a large data set, you can use a second server to hold part of the data, and for the calculation operation, you can cut the calculation, or through the additional node load. To take full advantage of the scale-out, you should implement the inherent system architecture design principles, otherwise, the implementation of the method will become cumbersome modification and segmentation operations.

Speaking of horizontal partitioning, the more common technique is to partition your services, or shards. Partitions can be partitioned by partitioning each functional logical set, and can be differentiated by geography or by a similar pay vs. unpaid user. The advantage of this approach is that you can run services or implement data storage by adding capacity.

Taking our image server as an example, it is possible to re-save images that have been stored on a single file server to multiple file servers, each with its own unique set of pictures. (see chart 1.4.) This architecture allows the system to save images to a file server, adding additional servers as you would add a hard disk when the server is about to be full. This design requires a naming convention that can bind the file name and host the server. The name of an image may be the form of a complete hashing scheme that maps all servers. Optionally, each image is assigned an incremented ID, and when the user requests the image, the image retrieval service only needs to save the range of IDs (similar to indexes) that are mapped to each server.

Figure 1.4: Picture storage services using redundancy and partitioning

Of course, assigning data or functionality to multiple servers is challenging. A key problem is data locality; for distributed systems, the more closely the data is calculated or manipulated, the better the performance of the system. Therefore, one potential problem is that data is stored across multiple servers, and when a data is needed, they are not together, forcing the server to pay an expensive performance price for getting data from the network.

Another potential problem is inconsistency. When several different services read and write to the same shared resource, it is possible to encounter a competitive state-some data should be updated, but the read operation happens just before the update-in which case the data is inconsistent. Example of a competing state that might appear in a managed scenario, a client sends a request to rename an image titled "Dog" to "little guy." While another client sends a request to read this image. The second client displays the title "Dog" or "little guy" that is not clear.

Of course, there are some barriers to partitioning, but partitioning allows you to cut the problem-data, load, usage patterns, and so on-into manageable chunks. This will greatly improve scalability and manageability, but it is not without risk. There are many ways to reduce the risk and deal with the problem, but the space is limited, not to repeat it. If interested, see this article for more fault tolerance and detection information.

1.3. Building an efficient and scalable data access module

Some core issues have been taken into account when designing distributed systems, so let's talk about some of the more difficult parts: scalable data access.

For most simple Web applications, such as the lamp system, it is similar to Figure 1.5.

Figure 1.5: Simple Web Application

As they grow, there are two main changes: Application server and database extension. In a highly scalable application, the application server is typically minimized and generally shared-nothing architecture (note: Shared nothing architecture is a distributed computing architecture in which there is no centralized storage state, There is no resource competition in the whole system, this kind of architecture has very strong extensibility and is widely used in Web application, which makes the application server layer of the system scalable. Because of this design, the database server can support more loads and services, and the real scaling and performance changes in this layer are starting to work.

The rest of the chapters focus on making these types of services faster by providing fast data access through some of the more commonly used strategies and methods.

Figure 1.6:oversimplified Web Application

Most systems are simplified to figure 1.6, which is a good start. If you have a lot of data, you want quick access, just like a bunch of candy placed at the top of your office drawer. Although too simplistic, the preceding statement hints at two difficult questions: storage scalability and fast access to data.

For this section, let's assume you have a large data storage space (TB), and you want users to randomly access a small subset of the data (see Figure 1.7). This is similar to locating an image file on a file server in an example of an image application.

Figure 1.7:accessing Specific data

This is very challenging because it requires terabytes of data to be loaded into memory and converted directly to disk IO. To know that reading from disk is many times slower than reading from memory-the memory is accessed as fast as the Chuck Norris (Karate Champion), and the disk accesses like a heavy truck. This speed difference increases more on large datasets, and in real-time sequential reads, memory accesses are at least 6 times times faster than disks, and random reads are 100,000 times faster than disks (see "Big Data http://queue.acm.org/detail.cfm?id=1563874"). In addition, even with a unique ID, solving the problem of getting a small number of data storage locations is a daunting task. It's like looking at your candy storage point and taking out the last piece of jolly rancher-flavored candy.

Thankfully, there are many ways you can make this easier, and four of the more important are cache, proxy, index, and load balancing. The remainder of this chapter discusses how to use each of these concepts to speed up data access.

Cache

The cache leverages the local access principle: the most recently requested data may be requested again. They are used almost every layer of the computer: hardware, operating systems, Web browsers, Web applications, and so on. Caching is like short-term storage memory: it has a space limit, but usually accesses faster than the source data source and contains most recently accessed entries. Caches can exist at various levels of the architecture, but often are common at the front end, where it is often necessary to quickly return data without the burden of a downstream tier.

How do we use caching for fast access to data in our API example? In this case, there are two places where you can insert a cache. One operation is to add a cache to your request layer node, figure 1.8.

Figure 1.8:inserting A cache on your request layer node

Configure a cache directly on a request-tier node to store the appropriate data locally. Each time a request is sent to the service, if the data exists, the node quickly returns the locally cached data. If the data is not in the cache, the request node finds the data on the disk. The request-tier node cache can be stored in memory and node-local disks (faster than networked storage).

Figure 1.9:multiple Caches

What happens when you extend these nodes? Figure 1.9 shows that if the request layer is extended to multiple nodes, each host may still have its own cache. However, if your load balancer randomly allocates requests to nodes, the same requests will point to different nodes, increasing the cache's hit-loss rate. There are two options to solve this problem: global cache and distributed cache.

Global cache

The global cache is as its name implies: all nodes use the same cache space, which involves adding a server, or some kind of file storage system, faster than accessing the source storage and accessing through all nodes. Each request node queries the local cache in the same way, which can be a bit more complex because it is easily overwhelmed when the number of clients and requests increases, but in some architectures it is useful (especially those specialized hardware to make the global cache very fast, or the fixed data set needs to be cached).

There are two common forms of caching in the description diagram. In Figure 1.10, when a cached response is not found in the cache, the cache itself looks for data from the underlying storage. In Figure 1.11, when data is not reached in the cache, the request node retrieves the data to the underlying.

Figure 1.10:global Cache where the cache is responsible for retrieval

Figure 1.11:global Cache where request nodes is responsible for retrieval

Most applications that use global caching tend to be in the first category, which can manage the reading of data, preventing clients from requesting large amounts of the same data. However, in some cases, the second type of implementation seems more meaningful. For example, if a cache is used for very large files, a low-hit cache will cause the buffer to fill the cache for misses, in which case the cache will have a large percentage of the total data set. Another example is the schema design in which the files are stored in the cache statically and are not excluded. (This may be because the application requires a delay in the surrounding data-some fragments of data may need to be very fast in the big data set-in some places the application logic is clearing out the exclusion policy or the hotspot is better than the caching scheme)

Distributed cache

In the distributed cache (Figure 1.12), each node caches part of the data. If you look at the refrigerator as a grocery store cache, then the distributed cache is like putting your food in multiple places--your fridge, cupboard, and bento box--so that you don't have to run to the grocery store for easy access anywhere. The cache is typically split with a consistent hash function, so that when a request node looks for a data, it can quickly know where to find it in the distributed cache to determine if the data is available from the cache. In this case, each node has a small cache that can send a request for data to other nodes before the data is found directly to the original data. As a result, one of the benefits of distributed caching is that you can have more cache space simply by adding new nodes to the request pool.

One drawback of distributed caching is the repair of missing nodes. Some distributed cache systems bypass this problem by making multiple backups at different nodes; However, you can imagine that this logic is rapidly becoming complex, especially when you add or delete nodes at the request level. Even if a node disappears and some cached data is lost, we can also get it at the source data storage address-so this is not necessarily catastrophic!

Figure 1.12:distributed Cache

The great thing about caching is that they make our access faster (assuming it's correct), and the way you choose is faster than more requests. However, the cost of all these caches is that there must be extra storage space, usually in expensive memory, and never handout. Caching allows things to be handled faster and provides system functionality under high load conditions, otherwise it will degrade the server.

There is a popular open source cache project memcached (http://memcached.org/) (which can be used as a local cache or as a distributed cache); There are, of course, other operational support (including some unique settings for language packs and frameworks).

Memcached is used as a lot of large web sites, although he is very powerful, but also just a simple memory key-value storage, it optimizes any data storage and fast Retrieval (O (1)).

Facebook uses a variety of different caches to improve the performance of their sites (see "Facebook Caching and Performance"). At the language level (using PHP built-in function calls) they use the $globalsand APC cache, which helps make intermediate function calls and results return faster (most languages have such libraries used to improve the performance of Web pages). The global cache used by Facebook is distributed across multiple servers (see "Scaling memcached at Facebook") so that a function call that accesses the cache can use many parallel requests to fetch stored data on different memcached servers. This allows them to have higher performance and throughput when assigning data spaces to users, and a central server for updates (which is important because cache invalidation and consistency will be a big challenge when you run thousands of servers).

Now let's discuss what to do when the data is not in the cache ...

Agent

Simply put, a proxy server is a hardware or software that is in the middle of a client and server that receives requests from clients and forwards them to the server. Proxies are typically used to filter requests, log logs, or convert requests (Add/Remove headers, encrypt/decrypt, compress, and so on).

Figure 1.13: Proxy Server

A proxy server is also useful when you need to reconcile requests from multiple servers, allowing us to perform optimizations on request traffic from an entire system perspective. Compressed forwarding (collapsed forwarding) is one of the ways in which agents speed up access by compressing multiple identical or similar requests in the same request, and then sending a single result to each client.

Suppose that there are several nodes that want to request the same data, and that it is not in the cache. When these requests are proxied, the agents can merge them into a single request by compressing the forwarding technology, so that the data needs to be read only once from the disk (see Figure 1.14). This technology also has some drawbacks, because each request will have some delay, and some requests will be delayed because of waiting for merging with other requests. In any case, this technique can help improve performance in high-load environments, especially if the same data is repeatedly accessed. Compression forwarding is a bit like caching, except that it does not store the data, but acts as a proxy for the client and optimizes their requests to some degree.

In a LAN proxy server, the client does not need to connect to the Internet through its own IP, and the agent merges requests that request the same content. This is easier to confuse, because many proxies also act as caches (which is really a good place to slow down), but caching doesn't have to be a proxy.

Figure 1.14: Merging requests by proxy

Another way to use proxies is not just to merge requests for the same data, but also to merge data requests close to the storage source (typically disk). This strategy allows the request to maximize the use of local data, which can reduce the requested data latency. For example, a group of nodes request Part B information: PARTB1,PARTB2, etc., we can set up the agent to identify the space area of each request, then merge them into a request and return a BIGB, greatly reducing the data source read (see figure 1.15). When you randomly access the last terabytes of data, this request time difference is very obvious! Proxies are especially useful when using a high-load scenario, or restricting the use of caching, because it can basically combine multiple requests into one in bulk.

Figure 1.15:using A proxy to collapse requests for data, is spatially close together

It is important to note that proxies and caches can be used together, but it is usually better to put the cache in front of the agent and put it in front of the cause and in the many marathon races, it is best to let the runners run faster in the first place. Because the cache extracts data from memory quickly, it does not mind the existence of multiple requests for the same result. However, if the cache is on the other side of the proxy server, an additional delay is added before each request reaches the cache, which can affect performance.

If you are trying to add a proxy to your system, there are a number of options you can consider, and squid and varnish are tested and used extensively in many real Web sites. These proxy solutions offer a number of optimizations for most client-server communications. Installing one of them as a reverse proxy for the Web server layer (reverse proxy, as explained in the Load Balancer section below) can greatly improve the performance of the Web server and reduce the amount of work required to process requests from clients.

Index

Fast access to data using indexes is a well-known strategy for optimizing data access performance; Most of us are the indexes we learned from the database. The lasso reference increases the storage footprint and slower writes (because you have to write and update the index) in exchange for faster reads.

You can apply this concept to a large data set like an application in a traditional relational data store. The trick to focus on indexing is that you have to think carefully about how users will access your data. If the dataset has many TBs, but each packet (payload) is small (probably only 1KB), then the index must be used to optimize data access. Finding small packets in such a large data set is a challenging task because you can't traverse all the data in a reasonable time. It is even more likely that such a large data set is distributed across several (or even many) physical devices-which means you need to find the right physical location of the desired data in some way. Indexing is the most appropriate way to do this kind of thing.

Figure 1.16:indexes

An index can be used as a table of contents-each item in the table indicates where your data is stored. For example, if you are looking for the second part of B data-how do you know where to look? If you have an index sorted by data type (data a,b,c), the index tells you where data B starts. Then you can jump (seek) to that location and read the second part of the data b you want. (see figure 1.16.)

These indexes are often stored in memory or stored in a local location that is very fast for client requests (somewhere very. Berkeley DBs (Bdbs) and tree data structures often store data sequentially and are ideal for storing indexes.

Often the index has many layers, as a data map, to point you from one place to another, until you get the data you want. (see figure 1.17.)

Figure 1.17:many Layers of indexes

An index can also be used to create several different views of the same data. For large datasets, this is a great way to define different filters and categories (sort) without having to create multiple additional copies of the data.

Imagine, for example, that the image storage system is actually storing images of every page of a book, and that the service allows customers to query the text in those images, searching for the contents of all the books on each subject, just as the search engine allows you to search for HTML content. In this case, all of the book's pictures take up a lot of server storage, and finding one of the pages shows a bit of difficulty for the user. First, the inverted index (inverse indexes) used to query any word or array of words (tuples) needs to be easily accessed, and it is also a bit challenging to navigate to the exact page and location of the book and get an accurate picture as the return result. So, in this situation, the inverted index should map to each location (for example, Book B), and then B to include an index of all words, positions, and occurrences of each part of the index.

Can represent an inverted index of INDEX1, which may look like the following-each word or Word array corresponds to a book containing them.

Book
Word (s)(s)
Being awesome Book B, book C, book D
Always Book C, book F
Believe Book B

This intermediate index may look like the above, but may contain only the word, location, and book B information. This nested index schema makes each sub-index occupy a small enough space in case all of this information must be kept in a large inverted index.

This is a key point for large systems because, even with compression, these indexes are too large and too expensive (expensive) to store. In this system, if we assume that we have a lot of books in the World -100,000,000 (see Inside Google Books blog post)-Each book has only 10 pages (just for the good calculation below), each page has 250 words, that is 250 billion (250 billion) a word. If we assume that each word has 5 characters, each character occupies 8 bits (or 1 bytes, even if some characters are 2 bytes), so each word takes up 5 bytes, and if each word is included only once, the index takes up more than 1000GB of storage space. So, you can see how much more storage space is growing by creating an index that contains many other information-phrases, data locations, and occurrences.

Creating these intermediate indexes and representing the data with smaller segments can be resolved with big data problems. Data can be dispersed across multiple servers, and access is still fast. Index is the cornerstone of information retrieval (information retrieval), and is the foundation of modern search engine. Of course, this is just plain introduction, and there are many other in-depth studies that are not involved-such as how to make indexes faster, smaller, contain more information (such as associations (relevancy)), and seamless updates (under competitive conditions (race conditions), there are some administrative challenges , in addition to the large number of additions or modifications of data updates, especially related to association (relevancy) and score (scoring), there are some challenges).

It is important to find data quickly and easily, and indexing is an effective and simple tool to achieve this goal.

Load Balancer

Finally, another key part of all distributed systems, the load balancer. Load balancers are an integral part of various architectures because they assume the task of assigning loads to a set of nodes that process service requests. This allows multiple nodes in the system to transparently serve the same function (see Figure 1.18). Its main purpose is to handle a large number of concurrent connections and assign these connections to a request processing node, which can make the system scalable and handle more requests simply by adding new nodes.

Figure 1.18: Load Balancer

There are many algorithms for processing these requests, including random selection of nodes, cyclic selection, and even node selection based on specific conditions such as memory or CPU utilization. Load balancers can be implemented with software or hardware devices. An open source software load balancer that has recently been widely used is called HAProxy).

In a distributed system, a load balancer is often at the forefront of the system so that all requests are made to be distributed accordingly. In some of the more complex distributed systems, it is common to distribute a single request to multiple load balancers, as shown in 1.19.

Figure 1.19: Multi-load Balancer

Like proxies, some load balancers can also handle different requests based on the type of request (technically, this is called a reverse proxy).

A challenge for the load balancer is how to manage the data associated with the user's session. In an e-commerce site, if you have only one client, it's easy to save the user's stuff in the shopping cart, and it's important to see those things in the shopping cart the next time he visits (this is essential because when the user comes back to find a product that is still in the cart, it is likely to buy it). However, if a user is distributed to a node in a session, but the user is distributed to another node on the next visit, there is a possibility of inconsistency, as the new node may not retain the contents of the user's shopping cart. (If you put 6 box of farmers Spring in the shopping cart, you can come back next time to see the shopping cart empty, don't you get angry? One way to solve this problem is to keep the session on hold and allow the same user to always distribute on top of the same node, but this makes it difficult to take advantage of reliability measures like failover. If this is the case, the user's shopping cart is not lost, but if the user to maintain the node failure, there will be a special situation, the shopping cart is not lost in the assumption that the hypothesis is no longer established (although I hope not to write this hypothesis in the program). Of course, this problem can also be solved using the other strategies and tools described in this chapter, such as services and many methods that are not addressed (like server caching, cookies, and URL rewriting).

If there are not too many nodes in the system, a scenario such as a cyclic (round robin) DNS system might make more sense, because the load balancer might be expensive and add an additional layer of unnecessary complexity. Of course, in larger systems there will be a variety of scheduling and load balancing algorithms, simple points have random selection or cyclic selection, complex points can take into account the utilization and processing capacity of these factors. All of these algorithms distribute browsing and requests, and can provide useful reliability tools such as automatic failover or automatic failure nodes (such as node loss response). However, these advanced features can make diagnosing problems difficult. For example, a load balancer might remove a slow or time-out node when the system is heavily loaded (because the node handles a large number of requests), but for other nodes, this actually worsens the situation. It is important to have a lot of monitoring at this point because the overall system traffic and throughput may appear to be falling (because the nodes are processing fewer requests), but the individual nodes are getting busier.

A load balancer is a simple way to expand your system's capabilities, and as with the other technologies described in this article, it plays a fundamental role in the Distributed system architecture. The load balancer also provides a more critical feature, which must be able to detect the health of a node, for example, if a node loses its response or is overloaded, the load balancer can remove its total processing request from the pool of nodes, and then use the other different nodes that are redundant in the system.

Queue

So far we've covered many ways to read data faster, but another important part of the scalability of the data layer is the effective management of writes. When the system is simple, only the smallest processing load and very small database, write how fast can be predicted; however, in more complex systems, writing may require a long time that is almost impossible to determine. For example, the data might have to be written to several places in a different database or index, or the system might be at a high load. In these cases, writing or any of those kinds of tasks can take a long time, and the pursuit of performance and usability requires the creation of asynchrony in the system; A common way to do that is through queues.

Figure 1.20:synchronous Request

Imagine a system in which each client initiates a task request for a remote service. Each client sends their requests to the server, the server completes the tasks as quickly as possible, and returns the results to each client, respectively. In a small system, a server (or logical service) can provide prompt service to incoming client requests as quickly as they come in, and this situation should work well. However, when the server receives more requests than it can handle, each client is forced to wait for the other client's request to end before generating a response. This is an example of a synchronous request, shown in Figure 1.20.

This synchronization behavior can severely degrade client performance, and the client is forced to wait, effectively performing 0 of the work until its request is answered. Adding additional servers to bear the system load will not solve this problem; Even with effective load balancing, it is extremely difficult to ensure equal and fair distribution in order to maximize client performance. Also, if the server processing request is unreachable or fails, the client upstream will fail. An effective solution to this problem is the need to create abstractions between client requests and the execution of actual service delivery.

Figure 1.21: Managing Requests with queues

into the queue. A queue is as simple as it sounds: A task enters, is added to the queue, and the worker picks up the next task as long as it has the ability to handle it. (See figure 1.21) These tasks may represent a simple write database, or something complex, like generating a thumbnail preview for a document. When a client submits a task request to a queue, they are never forced to wait for the result; they only need to confirm that the request was received correctly. This confirmation may be referred to as a work result when the client requests it.

Queues enable clients to work asynchronously, providing a strategic abstraction of a client request and its response. In other words, in a synchronous system, there is no difference between a request and a response, so they cannot be managed separately. In an asynchronous system, the client requests a task that the server responds to a task that has received an acknowledgment, and then the client can periodically check the status of the task, once it finishes requesting the result. When the client waits for an asynchronous request to complete, it is free to perform other work and even asynchronously requests other services. The latter is an example of how queues and messages can be leveraged in a distributed system.

Queues also provide protection against service outages and failures. For example, it is very easy to create a highly robust queue that can retry a service request that failed due to an instantaneous server failure. It is preferable to use a queue to enhance the quality of service by requiring complex and often contradictory client-side error handlers compared to direct exposure of clients to intermittent service outages.

Queues are a foundation for managing distributed communication between different parts of any large-scale distributed system, and there are many ways to implement them. There are many open-source queues such as RabbitMQ, ActiveMQ, BEANSTALKD, but some also use services like zookeeper, or even data storage like Redis.

1.4. Conclusion

Designing an effective system for fast, big data access is fun, and there are plenty of good tools to help design a wide variety of applications. This article covers only a few examples, just some superficial stuff, but it will be more and more-and there will be more innovation in this field.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.