Large Web site technology Architecture (i)--large-scale website architecture evolution
Each pattern describes a problem that recurs around us and the core of the solution to the problem. This way, you can use the program again and again without having to do repetitive work.
The so-called site architecture model is to solve the large-scale web site with high concurrent access, massive data, high-reliability running lights a series of problems and challenges. Therefore, many solutions have been put forward in practice to achieve the high performance, high reliability, scalability, scalability, security and other technical architecture goals of the website.
1. Layering
Word segmentation is the most common architecture in enterprise applications. The priest divides the system into several parts in the horizontal dimension, each part is responsible for relatively simple and relatively single duties, and then forms a complete system through the upper layer dependency and scheduling of the underlying.
In the hierarchical architecture of the website, the common 3 layer is the application layer, the service layer and the data layer. The application layer is responsible for the presentation of the business and the view; The service layer provides service support for the application layer; The database provides data storage access services such as databases, caches, files, search engines, and so on.
Layered architecture is logical, on the physical deployment, the three-tier architecture can be deployed on the same physical machine, but with the development of the website business, it is necessary to separate the layered module deployment, that is, the three-tier structure deployed on different servers, is the site has more computing resources to respond to more and more user access.
So although the initial goal of the layered architecture is to plan the clear logical structure of the software for ease of development and maintenance, in the course of the development of the website, the layered structure is critical to the development of the Web site to support high concurrency to the distributed direction .
2. Separation
If layering is slicing the software horizontally, the separation is to slice the software vertically.
The larger the website, the more complex the function, the more kinds of services and data processing, separating these different functions and services, and packaging them into high-cohesion and low-coupling modules, not only helps the development and maintenance of the software, but also facilitates the distributed deployment of different modules, and improves the concurrent processing ability and function expansion ability of the website.
Large Web sites can be separated by a small granularity. For example, in the application layer, different businesses are separated, such as shopping, forums, search, ads separated into different applications, the opposing team is responsible for the deployment on different servers.
3. Distributed
For large web sites, one of the main purposes of tiering and partitioning is to facilitate distributed deployment of the segmented modules, to deploy different modules on different servers, and to work collaboratively through remote calls. Distributed means that more computers can be used to complete the same work, the more computers, more CPU, memory, storage resources, the more can be processed by the amount of concurrent access and data, in turn, to provide more users with services.
In the Web application, there are several common distributed schemes.
Distributed Applications and services: Distributed deployment of tiered and separated application and service modules can improve site performance and concurrency, speed development and release, and reduce database connection resource consumption.
Distributed static resources: static resources such as JS, CSS, logo images and other resources distributed deployment, and the use of independent domain names, that is, people often say the separation of static and dynamic. A distributed deployment of static resources can reduce the load pressure on the application server and speed up the browser's concurrent loading by using a separate domain name.
Distributed data and storage: large Web sites need to process massive amounts of data in P, and a single computer cannot provide such a large amount of storage space, and these databases require distributed storage.
Distributed computing: The current Web site uses the Hadoop and MapReduce distributed computing framework for this batch calculation, which is characterized by mobile computing rather than moving the data, distributing the computational program to the location of the data to speed up computation and distributed computing.
4. Cluster
For a module in a user access set, a standalone deployed server needs to be clustered, that is, multiple servers deploy the same application to form a cluster, which is shared externally by a load balancer device.
A server cluster can provide more concurrent support for the same service, so when more users access it, only the new machine is added to the cluster, and when one of the servers fails, a load-balanced failover mechanism can be used to transfer the request to other servers in the cluster. Therefore, the availability of the system can be improved .
5. Cache
The purpose of caching is to lighten the server's calculations and return the data directly to the user. In today's software design, the cache is everywhere. The specific implementation has CDN, reverse proxy, local cache, distributed cache, and so on.
There are two conditions for using the cache: Access to the data hotspot is unbalanced, that is, some frequently accessed data needs to be placed in the cache, the data is valid for a certain period of time, but soon expires, whether the data expires due to the dirty read, affecting the correctness of the data.
6. Asynchronous
With async, messaging between businesses is not a synchronous call, but instead divides a business operation into multiple phases, each of which is executed asynchronously through the method of sharing data.
The implementation can be handled in a single server through multi-threaded shared memory, which can be implemented asynchronously through distributed Message Queuing in a distributed system.
The typical asynchronous architecture is the producer-consumer approach, where there is no direct invocation.
7. Redundancy
A Web site needs to run continuously for 7x24 hours, so there must be a redundancy mechanism to prevent access when a machine goes down, and redundancy can be highly available by deploying at least two servers to form a cluster. Databases require hot and cold backups in addition to regular backups. Disaster preparedness data centers can even be deployed on a global scale.
8. Automation
There are automated release processes, automated code management, automated testing, automated security detection, automated deployment, automated monitoring, automated alarms, automated failover, automated failure recovery, and more.
9. Security
The
Web site has many patterns in the security architecture: Authentication by Password and mobile verification code, login, transaction need to encrypt network communication , in order to prevent the misuse of resources by the bot, it is necessary to use verification code to identify the common XSS attacks, SQL injection needs to be encoded, and spam needs filtering.