Address of this article
Original address
For the understanding of large-scale web site technology, we can explain the technical points of the website architecture from the aspects of system performance, usability, scalability, extensibility and security from the aspects of the Organization of architecture technology and the structure elements as dimensions. There is another, more intuitive way of organizing, which is described in terms of the site architecture technology used at different architectural levels.
The site system architecture hierarchy is as follows:
The structure of this site is divided into 8 levels, in which the database Center room architecture is the physical foundation of all the above architectures; security architecture and database capture monitoring architecture is an important guarantee across all levels, these two architectures mainly address five business-related security issues and monitoring issues, is a place that always needs attention;
For the current large-scale network, can be divided into front-end, application layer, service layer, storage layer, background and so on 5 aspects, in short, the front-end storage is static web pages, do not involve the business, is for the client to respond in a timely manner, and show some static content; The application layer is where the business logic is processed, is mainly published in this place, the actual user's various business processing, but also mainly in this place; The service layer is between the application layer and the storage layer, mainly providing a variety of distributed services, such as distributed cache, can reduce the pressure of the storage layer, and respond to the request of the application layer and improve the performance as soon as possible. Storage layer is a place to store various business data, including relational, non-relational database, files have been synchronized between their data; The above 4 levels can be said to be real-time business functions, there is a level of problems, will directly affect the user experience, in these 4 levels behind, there is a background, it is not directly communicate with users , but from the above business through the search engine, Data Warehouse, referral system, etc., to create more valuable information, in the background to provide support for the business.
1. Front-end architecture
(including: Browser optimization technology, CDN, static and dynamic separation, independent deployment of resources, image services, reverse proxy, DNS and other 7 technologies)
The front end refers to the user's request to reach the site application server before the link, usually does not include the website business logic, not processing dynamic content. Generally open a Web page or interface, a input URL can see the information, it belongs to this part, and then slowly loaded, is the content of the business layer behind, because the content is static, will not change, all can be prepared in various places, put in the CDN, to achieve the effect of instantaneous response. To achieve this effect, there are several architectural techniques:
Browser optimization Technology
Not to optimize the browser, but by optimizing the response page, to speed up the loading and display of browser pages, commonly used page cache, merge HTTP reduce the number of requests, using page compression.
Cdn
Content distribution network, deployed in the network operator room, enables users to obtain content through the shortest path by distributing static page content to the nearest CDN server to the user.
Dynamic separation, static resource independent deployment
Static resources, such as JS, CSS, and other files are deployed on specialized server clusters, separated from web App dynamic content services, and using specialized (level two) domain names.
Image Services
The picture does not refer to the Website logo, button icon, etc., these files belong to the static resources mentioned above, should and JS, CSS deployment together. The picture here refers to the user uploaded images, such as product images, user avatars, image services also applicable to the independent deployment of Image server cluster, and use of independent (level two) domain name.
Reverse Proxy
Deploy in the website room, before the application server, static resource server, picture server, provide page cache service.
Dns
Domain Name service, the domain name is resolved to an IP address, DNS can be used to load balance, configuration CDN also need to modify DNS, so that the domain name after the resolution point to the CDN server.
2. Application Layer Architecture
(including: Development Framework, page rendering, load balancing, Session management, dynamic page static, business splitting, virtualization server, etc. 7 kinds of technologies)
The application layer is where the main business logic of the site is handled. Generally speaking, the use of PHP or Java and other technologies to achieve the network logic and front-end framework, refers to this part of the content, this part is the main level of interactive business implementation, the corresponding architecture technology is:
Development framework
Website business is changeable, most of the Web site software engineers are working overtime to develop the website business, a good development framework is crucial. A development framework of a number should be able to separate the focus, so that art, development engineers can be the department of the matter, easy to collaborate. There should also be some security policies built in to protect against web attacks.
Page rendering
The dynamic content and static page templates that are developed and maintained separately are integrated into a complete page that is ultimately displayed to the user.
Load Balancing
Multiple application servers are organized into a cluster that distributes user requests to different servers through load balancing techniques to cope with high concurrent load pressures that occur when large numbers of users are concurrently accessing.
Session Management
In order to achieve a highly available application server cluster, the application server is generally designed to be stateless and does not save the user request context information, but the Web site business usually needs to maintain user session information, need a special mechanism to manage the session, so that the cluster or even across the cluster application server can share the session.
Dynamic page Static
For dynamic pages that are particularly large and updated, they can be statically generated by generating a static page that accelerates user access using static page optimizations such as reverse proxies, CDNs, browser caches, and so on.
Business Split
The complex and large business split up to form a number of smaller products, independent development, deployment, maintenance, in addition to reducing system coupling, but also facilitate database business sub-Library. According to the business to split the relational database, the technical difficulty is relatively small, and the effect is relatively good.
Virtualized servers
Virtualizing a physical server into a polymorphic virtual server makes it easier to architect a highly available cluster of application servers with less resources for concurrent access to lower business.
3. Service Layer Architecture
(including: distributed messaging, distributed services, distributed cache, distributed configuration, etc. 4 technologies)
Provide basic services, supply layer calls, complete the website business. The service layer between the application layer and the storage layer, as the name implies is to provide a variety of services for the application layer, and the service layer itself is the source of content, and may be related to the storage layer, such as the cache is a portion of the storage layer of data to be processed and presented, the supply layer quickly called. The architecture technology of the service layer is mainly distributed service functions:
Distributed messaging
Using the Message Queuing mechanism, asynchronous message sending and low-coupling business relation between business and business, business and service are realized.
Distributed services
Provides high-performance, low-coupling, easy-to-reuse, easy-to-manage distributed services that implement a service-oriented architecture (SOA) on a Web site.
Distributed cache
Providing caching services for large-scale hotspot data through a scalable server cluster is an important means of optimizing website performance.
Distributed configuration
The system needs to configure a number of parameters, if these parameters need to be modified, such as distributed cache cluster to join the new cache server, the application client needs to modify the cache server list configuration, and restart the application server. Distributed configuration provides a dynamic push service during system run time, which pushes the configuration changes to the application system without restarting the server.
4. Storage Layer Architecture
(including: Distributed file, relational database, NoSQL database, data synchronization and other 4 kinds of technologies)
Provides persistent storage access and management services for data and files. This level is also the general sense of the database layer, mainly used to store a variety of data, of course, the concept of the data layer in a large web site, not only include the traditional relational database, but also include distributed file data, NoSQL database and data synchronization technology. The architecture technologies of the storage layer are:
Distributed files
Web site online business needs to store most of the files are pictures, Web pages, videos and other relatively small files, but the number of these files is very large, and often continue to increase, the need for scalable design of a better distributed file system.
relational database
Most of the major business is based on relational database development, but the relational database on the cluster scalability of the support table is poor. By increasing the routing capabilities of database access in the application's data access layer, database access is routed to different physical databases based on business configuration, enabling distributed access to relational databases.
NoSQL Database
At present, a variety of NoSQL databases, in memory management, data model, cluster distributed management and other aspects have advantages, but from the Community activity point of view, HBase is undoubtedly the best at present.
Data synchronization
Prior to the maturity of distributed database technologies that support data sharing across the globe, sites with multiple data centers must synchronize data between multiple datacenters to ensure that each datacenter has complete data. In practice, in order to reduce the pressure of the database, the database of things log (or NoSQL write operation log) synchronized to other data centers, according to log data replay, to achieve data synchronization.
5. Background architecture
(including: Search engine, Data Warehouse, recommendation system, etc. 3 kinds of technology)
In addition to processing users ' real-time access requests, there are some background non-real-time data analysis to be processed in the website application. Background architecture is mainly to do some non-real-time operations to provide support for front-end business processing, the main architecture technology are:
Search engine
Even the search engine inside the website needs to do the data increment update and the whole quantity update, build index and so on. These actions are performed periodically through the backend system.
Data Warehouse
Provide data analysis and data mining services based on offline data.
Recommendation system
Social networking sites and shopping sites provide personalized referral services by tapping into the relationships between people, people and goods, developing potential relationships and shopping interests.
(including: Browser , server business , server performance , system monitoring, system alarm, etc. 5 kinds of technology)
Monitor website access and system operation, provide support for website operation decision-making and operation and maintenance management. and monitoring exist in the entire business structure of the life cycle, after the business on-line, business operation status is normal, operations and maintenance personnel do the main work is and monitoring, in the business system failure or performance problems, also need to change the system support and problem positioning, and then solve the problem. The main architectural techniques at this level are:
Browser
The user's behavior is analyzed by embedding the JS script in the Site page to capture users ' browser environment and Operation Records.
The server business data includes two kinds, one is collects the user request operation log in the server side record, one is collects the application running period business data, for instance waits for processing the message number and so on.
Capture server performance data, such as system load, memory usage, network card traffic, and more.
System Monitoring
The data collected above is displayed graphically so that operations and operators monitor the health of the site, and this step is only a system monitoring. A more advanced approach is to automate operations based on , automatically handling system anomalies, and absorbing automated controls.
System Alarms
If the exceeds the predetermined normal threshold, such as the system load is too high, the message, text messages, voice calls and other ways to send alarm signals, waiting for the engineer to intervene.
7. Security Architecture
(including: Web attack, data protection, etc. 2 kinds of technologies)
Protect websites from attacks and sensitive information leaks. Security is also a very important problem of business operation, security mainly has two aspects, one is from the outside of the attack and protection, will affect the usability and performance of the website business, second, the website internal data installation and protection, will affect the data layer of the total sensitive information. The main security architecture technologies are:
Web attacks
Attacks initiated in the form of HTTP requests, the most harmful are XSS and SQL injection attacks. But as long as the measures are appropriate, both attacks are relatively easy to guard against.
Data protection
Sensitive information encrypts transmission and storage, protecting websites and user assets.
8. Data Center Room architecture
(including: Machine Room architecture, cabinet architecture, server architecture, etc. 3 kinds of technology)
Large sites require a server scale of 100,000, and the physical architecture of the engine room needs attention. For large network systems, the number of physical servers is bound to be very large, so that in the computer room, cabinets, servers and other levels must have the corresponding planning, so as to better support the above all levels.
Engine Room Architecture
For a large web site with 100,000 servers, each server consumes electricity (including the power of the server itself and the power consumption of air conditioning) about 2000 yuan a year, then the site of the annual computer room electricity needs 200 million yuan. Data center energy consumption is becoming more and more serious, Google, Facebook Choose the data center location when the trend to choose a good cooling, power supply.
Enclosure Architecture
Including cabinet size, network cable layout, indicator specifications, uninterruptible power supply, voltage specifications (48V DC or 220V civil AC) and a series of problems.
Server architecture
Large web site because of the large size of server procurement, most of the use of custom servers to replace the purchase of server machine. According to the application needs of the site, custom hard drives, memory, and even the CPU, while removing unnecessary peripheral interface (display output interface, mouse, keyboard input interface), and make space structure conducive to cooling.
"System Architecture" large Web site Architecture Technology List