High concurrent access and massive data large Web site architecture Technology ListLin Tao posted: 2016-4-19 12:12 Category: WebServer Tags: concurrency, massive data, high concurrency 44 times
The challenges of large Web sites come mainly from huge users, high concurrent access and massive data, and any simple business that needs to deal with the number of P-meter data and hundreds of millions of users can become tricky. Large Web site architecture is mainly to solve such problems.
Most of the content from the "large Web site Technology Architecture", this book is worth a look, highly recommended.
1. Front-end architecture
The front end refers to the user's request to reach the site application server before the link, usually does not include the website business logic, not processing dynamic content.
Browser optimization Technology
Not to optimize the browser, but by optimizing the response page, to speed up the loading and display of browser pages, commonly used page cache, merge HTTP reduce the number of requests, using page compression.
Cdn
Content distribution network, deployed in the network operator room, enables users to obtain content through the shortest path by distributing static page content to the nearest CDN server to the user.
Dynamic separation, static resource independent deployment
Static resources, such as JS, CSS, and other files are deployed on specialized server clusters, separated from web App dynamic content services, and using specialized (level two) domain names.
Image Services
The picture does not refer to the Website logo, button icon, etc., these files belong to the static resources mentioned above, should and JS, CSS deployment together. The picture here refers to the user uploaded images, such as product images, user avatars, image services also applicable to the independent deployment of Image server cluster, and use of independent (level two) domain name.
Reverse Proxy
Deploy in the website room, before the application server, static resource server, picture server, provide page cache service.
Dns
Domain Name service, the domain name is resolved to an IP address, DNS can be used to load balance, configuration CDN also need to modify DNS, so that the domain name after the resolution point to the CDN server.
2. Application Layer Architecture
The application layer is where the main business logic of the site is handled.
Development framework
Website business is changeable, most of the Web site software engineers are working overtime to develop the website business, a good development framework is crucial. A development framework of a number should be able to separate the focus, so that art, development engineers can be the department of the matter, easy to collaborate. There should also be some security policies built in to protect against web attacks.
Page rendering
The dynamic content and static page templates that are developed and maintained separately are integrated into a complete page that is ultimately displayed to the user.
Load Balancing
Multiple application servers are organized into a cluster that distributes user requests to different servers through load balancing techniques to cope with high concurrent load pressures that occur when large numbers of users are concurrently accessing.
Session Management
In order to achieve a highly available application server cluster, the application server is generally designed to be stateless and does not save the user request context information, but the Web site business usually needs to maintain user session information, need a special mechanism to manage the session, so that the cluster or even across the cluster application server can share the session.
Dynamic page Static
For dynamic pages that are particularly large and updated, they can be statically generated by generating a static page that accelerates user access using static page optimizations such as reverse proxies, CDNs, browser caches, and so on.
Business Split
Splitting complex and large businesses into smaller products that are independently developed, deployed,
Maintenance, in addition to reducing the system coupling degree, but also facilitates the database business sub-Library. According to the business to split the relational database, the technical difficulty is relatively small, and the effect is relatively good.
Virtualized servers
Virtualizing a physical server into a polymorphic virtual server makes it easier to architect a highly available cluster of application servers with less resources for concurrent access to lower business.
3. Service Layer Structure
Provide basic services, supply layer calls, complete the website business.
Distributed messaging
Using the Message Queuing mechanism, asynchronous message sending and low-coupling business relation between business and business, business and service are realized.
Distributed services
Provides high-performance, low-coupling, easy-to-reuse, easy-to-manage distributed services that implement a service-oriented architecture (SOA) on a Web site.
Distributed cache
Providing caching services for large-scale hotspot data through a scalable server cluster is an important means of optimizing website performance.
Distributed configuration
The system needs to configure a number of parameters, if these parameters need to be modified, such as distributed cache cluster to join the new cache server, the application client needs to modify the cache server list configuration, and restart the application server. Distributed configuration provides a dynamic push service during system run time, which pushes the configuration changes to the application system without restarting the server.
4. Storage Layer Architecture
Provides persistent storage access and management services for data and files.
Distributed files
Web site online business needs to store most of the files are pictures, Web pages, videos and other relatively small files, but the number of these files is very large, and often continue to increase, the need for scalable design of a better distributed file system.
relational database
Most of the major business is based on relational database development, but the relational database on the cluster scalability of the support table is poor. By increasing the routing capabilities of database access in the application's data access layer, database access is routed to different physical databases based on business configuration, enabling distributed access to relational databases.
NoSQL Database
At present, a variety of NoSQL databases are emerging, in memory management, data model, cluster distributed management, etc.
Each has its advantages, but from a community-based perspective, hbase is undoubtedly the best at the moment.
Data synchronization
Prior to the maturity of distributed database technologies that support data sharing across the globe, sites with multiple data centers must synchronize data between multiple datacenters to ensure that each datacenter has complete data. In practice, in order to reduce the pressure of the database, the database of things log (or NoSQL write operation log) synchronized to other data centers, according to log data replay, to achieve data synchronization.
5. Background architecture
In addition to processing users ' real-time access requests, there are some background non-real-time data analysis to be processed in the website application.
Search engine
Even the search engine inside the website needs to do the data increment update and the whole quantity update, build index and so on. These actions are performed periodically through the backend system.
Data Warehouse
Provide data analysis and data mining services based on offline data.
Recommendation system
Social networking sites and shopping sites provide personalized referral services by tapping into the relationships between people, people and goods, developing potential relationships and shopping interests.
Monitor website access and system operation, provide support for website operation decision-making and operation and maintenance management.
Browser data collection
The user's behavior is analyzed by embedding the JS script in the Site page to capture users ' browser environment and Operation Records.
Server Business
The server business data includes two kinds, one is collects the user request operation log in the server side record, one is collects the application running period business data, for instance waits for processing the message number and so on.
Server Performance
Capture server performance data, such as system load, memory usage, network card traffic, and more.
System Monitoring
The data collected above is displayed graphically so that operations and operators monitor the health of the site, and this step is only a system monitoring. A more advanced approach is to automate operations based on collected data, automatically handling system anomalies, and absorbing automated controls.
System Alarms
If the collected data exceeds the predetermined normal threshold, such as the system load is too high, the message, text messages, voice calls and other ways to send alarm signals, waiting for the engineer to intervene.
7. Security Architecture
Protect websites from attacks and sensitive information leaks.
Web attacks
Attacks initiated in the form of HTTP requests, the most harmful are XSS and SQL injection attacks. But as long as the measures are appropriate, both attacks are relatively easy to guard against.
Data protection
Sensitive information encrypts transmission and storage, protecting websites and user assets.
8. Data Center Room architecture
Large sites require a server scale of 100,000, and the physical architecture of the engine room needs attention.
Engine Room Architecture
For a large web site with 100,000 servers, each server consumes electricity (including the power of the server itself and the power consumption of air conditioning) about 2000 yuan a year, then the site of the annual computer room electricity needs 200 million yuan. Data center energy consumption is becoming more and more serious, Google, Facebook Choose the data center location when the trend to choose a good cooling, power supply.
Enclosure Architecture
Including cabinet size, network cable layout, indicator specifications, uninterruptible power supply, voltage specifications (48V DC or 220V civil AC) and a series of problems.
Server architecture
Large web site because of the large size of server procurement, most of the use of custom servers to replace the purchase of server machine. According to the application needs of the site, custom hard drives, memory, and even the CPU, while removing unnecessary peripheral interface (display output interface, mouse, keyboard input interface), and make space structure conducive to cooling.
Source: Large Web site Architecture technology at a glance high concurrent access and massive data
For reprint Please specify: reproduced from 26 points of the blog
This article link address: High concurrent access and massive data large Web site architecture Technology List
High concurrent access and massive data large Web site architecture Technology List