Architectural layering of standard web systems
– Please specify the source of the reprint
1. Architecture Hierarchy Chart
In the We describe the components of the Web system architecture. and the technical component/service implementation of each layer is given. The following points need to be noted:
The system architecture is flexible, depending on the requirements, not necessarily every layer of technology needs to be used. For example, some simple CRM systems may not require k-v as a cache at the beginning of the product, some systems have little access and may have only one business Server, so there is no need to apply the Load balancer layer.
The communication layer between business systems does not include traditional HTTP request methods. This is because the latency of the HTTP request-response is high, and there are many communications unrelated to the formal request (which is described in more detail in the following section). Therefore, the traditional HTTP request method is not suitable for use between two high-load systems, and its more application scenarios are various clients (WEB, IOS, Android, etc.) server-side request calls.
We classify the cache system used in the business code into the data storage layer because the K-V storage system, like Redis, is essentially a key-value database. Why Redis is so fast that it can be used as a cache, I'll describe it in detail later in the article.
It is also important to note that there is virtually no absolute connection between each layer in the schema diagram above (such as the business layer where the load layer is bound to transfer the request), which is not the case, and in general the layers can be accessed across. Example: If HTTP accesses a picture resource, the load layer does not send the request to the business layer, but instead directly to the deployed Distributed file system to find the picture resource and return it. For example, when using LVS to do the MySQL load, the load layer is directly cooperating with the data storage layer.
2. Load Distribution Layer
In fact, the concept of load balancing is very broad, the process is to be derived from the external processing pressure through a certain law/means of distribution to the internal processing nodes. In daily life, we are dealing with the load technology at any time, for example: traffic guidance during rush hour, air flow control of Civil aviation Authority, and station-to-station system of bank counter.
What we call a load distribution layer here is a narrow load balancing on a computer system implemented by software. A large (day PV 100 million +), Medium (day PV 10 million +) Web business system, it is not possible to have only one business processing services, but multiple servers at the same time a service of the same business. So we need to design an architectural approach based on business patterns that will offload business requests from external clients to each available business node. As shown in the following:
The load layer also has a role to assign different request types to different servers based on the user's request rules. For example, if an HTTP request is to request a picture, then the load layer will go directly to the image storage media to find the corresponding picture; If an HTTP request is an order submitted, the load layer will send the order submission to the specified order service node according to the rules.
Different business requirements, the use of the load layer scheme is also different, which will test the architect's choice of options. For example, Nginx can only handle the TCP/IP Protocol Application Layer HTTP protocol, if you want to process the TCP/IP protocol, you should follow the third-party tcp-proxy-module mode. Better directly in the TCP/IP layer load scenario, is the use of haproxy.
Common load-layer architectures include:
-Standalone Nginx load or haproxy solution
-LVS (DR) + Nginx solution
-DNS Polling + LVS + nginx scheme
-Intelligent DNS (DNS routing) + LVS + nginx scheme
The following articles will detail these load schema scenarios and the variants of these scenarios.
3. Business Service layer and Communication layer 3.1, overview
Popular speaking is our core business layer, order business, construction management business, diagnosis and treatment business, payment business, log business and so on. As shown in the following:
Obviously in large and medium-sized systems, these services can not be independent, the general design requirements will involve decoupling between subsystems: that is, X1 system in addition to the existence of the underlying support system (such as user rights system), the X1 system does not need to know and its logical equivalent X2 system exists can work. In this case, to complete a more complex business, inter-subsystem calls are essential: for example, a business after processing success, will invoke the B business to execute, a business after processing failure, will invoke the C business to execute, or a business and D business in some cases is an inseparable whole, only at the same time success is successful, One of the failures is that the whole big business process fails. As shown in the following:
In this way the communication layer between the business is a topic that cannot escape. In a subsequent article, we will explain the business communication layer technology, especially the technical selection considerations for the Business communication layer, using the principles and usage of the Alibaba Dubbo Framework, the AMQP protocol-based Message Queuing and the Kafka Message Queuing technology.
3.2, have to mention the HTTP request method
Some readers may ask why the communication layer between the business systems does not refer to HTTP as a calling method. After all, many companies are now using this approach as a way of calling between business systems. Let's start with a diagram to see the invocation of HTTP methods. (Note that this procedure does not consider the process of HTTP client caching and does not take into account the process of DNS domain name resolution, starting with a reliable TCP connection from HTTP):
From here we can see the following questions:
- From a technical point of view, the HTTP request is to establish a TCP connection when a call is required, and to send and wait for data loopback, which may need to be closed after the request results. This principle makes a lot of time wasted on technical features unrelated to the business.
- In addition, sending head information and receiving data such as head is meaningless for business data. In the case of a small amount of traffic, such a process can still be received, but when the bandwidth resources are tight, such data space is invaluable.
- Independent HTTP requests because there is no concept of "governance center" in the SOA architecture, simple HTTP requests are hardly guaranteed to be responsible for contextual consistency in business linkage. Of course you can encode it yourself, but is that really reasonable?
- Finally, it is necessary to note that the HTTP pool is now available for components such as the Apache HTTP component to reduce TCP connection duration, but this is only a matter of optimizing HTTP as an inter-business communication, and other problems persist.
Based on the above description, this article does not recommend the use of HTTP as a means of communication/invocation between the business, and the recommended HTTP method is limited to Web, IOS, Android and so on, such as the way clients request services.
4. Data storage Layer
Data storage will be another key point to be covered in this series of articles. The initial data before the business calculation, the temporary data in the calculation process, and the calculation results obtained after the calculation are all required to be stored. We first explain the basic classification of data storage from several dimensions through a mind map.
4.1, the principle of file storage
We explain the most basic principles of a filesystem by creating a EXT4 file system on the Centos6.5 system.
- First we will partition the local hard drive with the Fdisk command (that is, determine the range of controllable sectors), as shown in:
- Then we'll create the file system we want (EXT3, EXT4, LVM, XF, Btrfs, etc.) with the MKFS command above this area, as shown in:
- Finally we mount this file system to the specified path, as shown in:
View the mount information by using the DF command, as shown in:
Original aim the creation process tell us what the truth is?
A physical block, a physical block, is the smallest unit (typically 512 bytes) that our upper-level file system can operate on, and a physical block that corresponds to multiple physical sectors at the bottom. Usually a single SATA drive will have several robotic arms (determined by the number of physical discs), and several physical sectors (the size of the physical sector is determined by the disk factory, we cannot change).
The work of a single sector is one-way, so the mapping of a physical block is one-way. The principle is that when the mechanical arm reads the data from this sector, the hardware chip does not allow the robotic arm to write data to this sector at the same time.
Through the upper-level file system (EXT, NTFS, BTRFS, XF) of the lower physical block encapsulation, the OS is not required to directly manipulate the disk physical blocks, the author through the LS such a command to see a file also does not need to care about these files in the physical block storage format. This is why different file systems have different characteristics (some file systems support snapshots, and some file systems support data Recovery), the basic principle is that these file systems do not have the same specifications for the underlying physical blocks.
4.2. Block storage and file storage
In the previous section we describe the simplest and most primitive methods of physical block and file format specification, but with the increasing demand of data storage capacity and the requirement of data security, it is obvious that there is no way to meet the requirements of the storage environment, the two types of requirements are as follows:
Stable expansion of storage capacity without destroying the data information currently stored, without compromising the stability of the entire storage system.
File sharing allows multiple servers to share stored data and can read and write to the file system.
To solve both of these problems, let's first extend the problem to the legend in the previous section, as shown in:
It is clear that the answer to the two questions in the diagram is yes, which is the problem that we are going to address for the block storage system.
4.2.1, block Storage systems
Let's talk a bit about block storage first. The simplest case we mentioned earlier is that the disk is on a local physical machine, and the physical block I/O commands that are transferred are also made through the South Bridge on the local physical machine board. But in order to extend the larger disk space and ensure data throughput, we need to detach the disk media from the local physical machine and have the I/o command for the physical block to be transmitted over the network:
Although the disk media and the local physical machine are separated, the nature of the direct transfer block I/o command is unchanged. The local South bridge transfer I/o command becomes a fiber transfer, and only the internal transfer I/o command within the physical machine becomes a network transmission, and I/O commands are regulated by some communication protocol (e.g. FC, SCSI, etc.).
The mapping of the file system is done locally, rather than the remote file system mapping. As we have mentioned above, because of the sequential nature of the block operations (the read operation of this sector is not performed when a sector is written), and the block operation belongs to the underlying physical operation cannot be changed to the upper level of the file logical layer of active feedback. Therefore, multiple physical hosts are not able to share files through this technology.
The block storage system is to solve the coexistence problem of large physical storage space, high data throughput and strong stability. As a server using this filesystem on the upper level, it is very clear that there is no other server other than it that can read and write to these physical blocks that are exclusive to it. In other words, it thinks that this huge volume of file storage is just the storage space on its local physical machine.
Of course, with the development of technology, there are now some technologies that can only be used to transfer standard SCSI commands using the TCP/IP protocol, in order to reduce the cost of building this block storage system (such as iSCSI technology). But this tradeoff is also at the expense of reducing the overall system's data throughput. Different business requirements can be based on the actual situation of technology selection.
4.2.2, File storage System
So what if the file system is migrated from the local physical machine to the remote network? Of course, a typical file storage system includes FTP, NFS, and DAS:
The key to a file storage system is that the file system is not native. Instead, the remote file system is accessed over the network, and data operations are performed by the remote file system action block I/o command.
In general, such as the local file system NTFS/EXT/LVM/XF is not allowed direct network access, so the general file storage System will be a layer of network protocol encapsulation, this is the NFS protocol/FTP Protocol/nas Protocol (note that we are talking about the protocol), The server file system of the file storage system is then manipulated by the protocol.
File storage System to solve the problem of primary file sharing, network File protocol can ensure that multiple clients share the file structure on the server. From the entire frame of the picture can be seen file storage system data read and write speed, data throughput is no way compared to block storage system (because this is not a file storage system to solve the primary problem).
From the introduction above we can clearly know that when faced with a large amount of data read and write pressure, the file storage system is certainly not our first choice, and when we need to choose a block storage system and face the cost and operation of the double pressure (san system building is more complex, and the cost of expensive equipment). And in the actual production environment we often encounter the data reading pressure, and need to share file information scene. So how do we solve this problem?
4.3. Object Storage System
The high-throughput, high-stability, and file-storage network-sharing, inexpensive object storage of a block storage system is designed to meet this need. Typical object storage systems include: MFS, Swift, Ceph, ozone, and more. Let us briefly describe the characteristics of the object storage System, and in the following article we will select an object storage system for detailed instructions.
The object storage system must be a distributed file system. But distributed file systems are not necessarily object storage systems
We know that file information is described by several attributes, including information such as file name, storage location, file size, current status, number of copies, and so on. We pull these attributes out and use the server for storage (metadata server) exclusively. To access a file, the client of the file operation first asks the metadata node for the basic information of the file.
As a distributed system, data consistency, contention for resources, and node anomalies all need to be harmonized. Therefore, the object storage system will generally have a monitoring/coordination node. Different object storage systems, the number of supported metadata nodes and monitoring/coordination nodes are inconsistent. But the general trend is "to be centralized".
The OSD node (object-based storage device) is used to store file content information. It is important to note that although the bottom of the OSD node and the block storage base are all dependent on block I/O, the superstructure is completely different: the OSD node is not the same as the block storage device, and the block Operation command skips the local file system directly for physical block operations.
In the following article we will select a popular object storage system, detail the object storage system, and describe the three core concepts and trade-offs in the Distributed Storage System (CAP): Consistency, extensibility, and fault tolerance.
4.3. Database storage
This article has written a description of many storage tiers, so the overview of our familiar or unfamiliar database storage technology is not covered here.
Later in this article I'll use MySQL to explain a few common architectural scenarios and performance tuning points, and of course the way the core data engines like InnoDB work in MySQL. These architecture solutions mainly solve the MySQL single-machine I/O bottleneck, data disaster recovery in the room, database stability, cross-room data disaster and other core issues.
Subsequent articles I will also select the current popular data caching system, explain its working principle, core algorithms and architecture. So that readers can design a business-compliant storage cluster based on their business situation. Of course there are non-relational database Cassandra, HBase, MongoDB in-depth introduction.
5. Characteristics of the evaluation architecture
How do we evaluate whether the top-level design of a service system is excellent? Aside from the jargon of stereotyped writing, stability, robustness, security, and so on. I have summed up a few points of evaluation for you from the actual work.
5.1. Construction cost
Any system architecture in the implementation of the production environment, it is necessary to pay the cost of construction. Obviously the cost of each company/organization is different (these costs include: design costs, asset procurement costs, operation and maintenance costs, third-party service costs), so how to use the limited cost to build a system that meets the needs of the business and adapt to the scale of access is a complex problem. In addition, the architect is not overly designed under this requirement.
5.2. Expansion/planning level
Depending on the development of the business, the entire system needs to be upgraded (this includes the ability to upgrade existing modules, merge existing modules, add new business modules, or improve data throughput in the event that the module functions unchanged). So how to try not to affect the original business work, with the fastest speed, minimum workload to carry out the horizontal and vertical expansion of the system, that is, a complex problem. A good system architecture can be upgraded without any sense of the user, or only require a short stop when certain critical subsystems are upgraded.
5.3, anti-attack level
Attacks on the system must be aimed at the weakest part of the system, which may come from outside (such as a dos/ddos attack) or from inside (password intrusion). A well-architected system is not an "absolutely unbreakable" system, but a "well-prevented" system. The so-called prevention, is to prevent possible attacks, staged to simulate the various attacks that may be encountered, so-called hidden, is the use of various means of the entire system of key information management, root authority, physical location, firewall parameters, user identity.
5.3, Disaster recovery level
A good architecture should consider different levels of disaster tolerance. Cluster disaster recovery, in the case of a service node in the cluster, another host in the cluster can take over and replace his work immediately, and the fault node can be detached; distributed disaster recovery: Distributed systems generally assume that a single point of failure/multipoint failure occurs in the entire system at any time, and when a single point of failure/multipoint fault is generated, The entire distributed system is also able to provide services, and the single point of failure/multipoint fault zones in distributed systems can be restored automatically/manually, and the distributed system will re-accept them; disaster tolerance (room level disaster): in the case of physical disaster in the computer room (physical network fracture, war destruction, Earthquakes, etc.), in a distant remote place, the backup system can find such a disaster, and actively take over the system operation, notify the system operators (depending on the operating requirements of the system, there may be more than one backup system). The biggest challenge of geo-disaster recovery is how to ensure the integrality of data.
5.4, Business adaptability level
System architecture in the final analysis is for business services, System architecture design selection must be to serve the current business as the premise. In the business communication layer mentioned above, choosing an SOA component or a Message Queuing component or choosing what message queue is a good business-driven event. For example, a business is a web front-end service that requires timely feedback to customer operation results, and B service pressure is very large. A business when invoking B business, B business can not return to the a business call result in millisecond time. The AMQP type of Message Queuing service can be used in this business scenario. In addition, two points, there are many in the industry to solve the same business scenarios exist in different scenarios, architects in the process of selecting the solution, it is necessary to grasp the characteristics of various solutions, so as to make the right choice; and there are enough solutions in the industry, Architects must not do "reinvent the wheel" in the absence of special requirements for the business.
5.5. Ease of maintenance
A set of service systems from the beginning of the need for operations and maintenance team continuous investment. Obviously, depending on the complexity of the system and the number of physical machines, the knowledge complexity of the operations team is different. When architects design a top-level architecture, they must also consider the operational difficulty and operational costs of the system.
6. Other instructions
Detailed architecture of the load layer, business layer, business communication layer, data storage layer in the following article we will use a number of articles for in-depth explanation, including core algorithms, erection principles, erection cases. In the following article we will first describe the system load layer.
In many systems we also involve the analysis of stored data to form data analysis results. This involves the architectural knowledge of the data analysis layer. The Hadoop ecosystem is a highly efficient, stable, and scalable data analysis ecosystem currently recognized by the industry. This series of blogs will not introduce the architecture design and development knowledge of the data analysis layer for the time being, and follow it up independently.
Everyone crossing we immediately into the load layer technology detailed explanation!
Article reprinted from: http://blog.csdn.net/yinwenjie/article/details/46480485
Thanks to the author for his selfless devotion.
Architectural layering of standard web systems