High concurrent access and massive data large Web site architecture Technology List

Source: Internet
Author: User
Tags website performance


High concurrent access and massive data large Web site architecture Technology ListLin Tao posted: 2016-4-19 12:12 Category: WebServer Tags: concurrency, massive data, high concurrency 44 times


The challenges of large Web sites come mainly from huge users, high concurrent access and massive data, and any simple business that needs to deal with the number of P-meter data and hundreds of millions of users can become tricky. Large Web site architecture is mainly to solve such problems.



Most of the content from the "large Web site Technology Architecture", this book is worth a look, highly recommended.


1. Front-end architecture


The front end refers to the user's request to reach the site application server before the link, usually does not include the website business logic, not processing dynamic content.


Browser optimization Technology


Not to optimize the browser, but by optimizing the response page, to speed up the loading and display of browser pages, commonly used page cache, merge HTTP reduce the number of requests, using page compression.



Cdn



Content distribution network, deployed in the network operator room, enables users to obtain content through the shortest path by distributing static page content to the nearest CDN server to the user.



Dynamic separation, static resource independent deployment



Static resources, such as JS, CSS, and other files are deployed on specialized server clusters, separated from web App dynamic content services, and using specialized (level two) domain names.



Image Services



The picture does not refer to the Website logo, button icon, etc., these files belong to the static resources mentioned above, should and JS, CSS deployment together. The picture here refers to the user uploaded images, such as product images, user avatars, image services also applicable to the independent deployment of Image server cluster, and use of independent (level two) domain name.


Reverse Proxy


Deploy in the website room, before the application server, static resource server, picture server, provide page cache service.


Dns


Domain Name service, the domain name is resolved to an IP address, DNS can be used to load balance, configuration CDN also need to modify DNS, so that the domain name after the resolution point to the CDN server.



2. Application Layer Architecture



The application layer is where the main business logic of the site is handled.


Development framework


Website business is changeable, most of the Web site software engineers are working overtime to develop the website business, a good development framework is crucial. A development framework of a number should be able to separate the focus, so that art, development engineers can be the department of the matter, easy to collaborate. There should also be some security policies built in to protect against web attacks.


Page rendering


The dynamic content and static page templates that are developed and maintained separately are integrated into a complete page that is ultimately displayed to the user.



Load Balancing



Multiple application servers are organized into a cluster that distributes user requests to different servers through load balancing techniques to cope with high concurrent load pressures that occur when large numbers of users are concurrently accessing.


Session Management


In order to achieve a highly available application server cluster, the application server is generally designed to be stateless and does not save the user request context information, but the Web site business usually needs to maintain user session information, need a special mechanism to manage the session, so that the cluster or even across the cluster application server can share the session.



Dynamic page Static



For dynamic pages that are particularly large and updated, they can be statically generated by generating a static page that accelerates user access using static page optimizations such as reverse proxies, CDNs, browser caches, and so on.



Business Split



Splitting complex and large businesses into smaller products that are independently developed, deployed,



Maintenance, in addition to reducing the system coupling degree, but also facilitates the database business sub-Library. According to the business to split the relational database, the technical difficulty is relatively small, and the effect is relatively good.



Virtualized servers



Virtualizing a physical server into a polymorphic virtual server makes it easier to architect a highly available cluster of application servers with less resources for concurrent access to lower business.


3. Service Layer Structure


Provide basic services, supply layer calls, complete the website business.


Distributed messaging


Using the Message Queuing mechanism, asynchronous message sending and low-coupling business relation between business and business, business and service are realized.



Distributed services



Provides high-performance, low-coupling, easy-to-reuse, easy-to-manage distributed services that implement a service-oriented architecture (SOA) on a Web site.


Distributed cache


Providing caching services for large-scale hotspot data through a scalable server cluster is an important means of optimizing website performance.


Distributed configuration


The system needs to configure a number of parameters, if these parameters need to be modified, such as distributed cache cluster to join the new cache server, the application client needs to modify the cache server list configuration, and restart the application server. Distributed configuration provides a dynamic push service during system run time, which pushes the configuration changes to the application system without restarting the server.



4. Storage Layer Architecture



Provides persistent storage access and management services for data and files.


Distributed files


Web site online business needs to store most of the files are pictures, Web pages, videos and other relatively small files, but the number of these files is very large, and often continue to increase, the need for scalable design of a better distributed file system.


relational database


Most of the major business is based on relational database development, but the relational database on the cluster scalability of the support table is poor. By increasing the routing capabilities of database access in the application's data access layer, database access is routed to different physical databases based on business configuration, enabling distributed access to relational databases.


NoSQL Database


At present, a variety of NoSQL databases are emerging, in memory management, data model, cluster distributed management, etc.



Each has its advantages, but from a community-based perspective, hbase is undoubtedly the best at the moment.


Data synchronization


Prior to the maturity of distributed database technologies that support data sharing across the globe, sites with multiple data centers must synchronize data between multiple datacenters to ensure that each datacenter has complete data. In practice, in order to reduce the pressure of the database, the database of things log (or NoSQL write operation log) synchronized to other data centers, according to log data replay, to achieve data synchronization.


5. Background architecture


In addition to processing users ' real-time access requests, there are some background non-real-time data analysis to be processed in the website application.


Search engine


Even the search engine inside the website needs to do the data increment update and the whole quantity update, build index and so on. These actions are performed periodically through the backend system.


Data Warehouse


Provide data analysis and data mining services based on offline data.


Recommendation system


Social networking sites and shopping sites provide personalized referral services by tapping into the relationships between people, people and goods, developing potential relationships and shopping interests.




Monitor website access and system operation, provide support for website operation decision-making and operation and maintenance management.


Browser data collection


The user's behavior is analyzed by embedding the JS script in the Site page to capture users ' browser environment and Operation Records.


Server Business 


The server business data includes two kinds, one is collects the user request operation log in the server side record, one is collects the application running period business data, for instance waits for processing the message number and so on.


Server Performance 

Capture server performance data, such as system load, memory usage, network card traffic, and more.


System Monitoring


The data collected above is displayed graphically so that operations and operators monitor the health of the site, and this step is only a system monitoring. A more advanced approach is to automate operations based on collected data, automatically handling system anomalies, and absorbing automated controls.


System Alarms


If the collected data exceeds the predetermined normal threshold, such as the system load is too high, the message, text messages, voice calls and other ways to send alarm signals, waiting for the engineer to intervene.


7. Security Architecture


Protect websites from attacks and sensitive information leaks.


Web attacks


Attacks initiated in the form of HTTP requests, the most harmful are XSS and SQL injection attacks. But as long as the measures are appropriate, both attacks are relatively easy to guard against.


Data protection


Sensitive information encrypts transmission and storage, protecting websites and user assets.


8. Data Center Room architecture


Large sites require a server scale of 100,000, and the physical architecture of the engine room needs attention.


Engine Room Architecture


For a large web site with 100,000 servers, each server consumes electricity (including the power of the server itself and the power consumption of air conditioning) about 2000 yuan a year, then the site of the annual computer room electricity needs 200 million yuan. Data center energy consumption is becoming more and more serious, Google, Facebook Choose the data center location when the trend to choose a good cooling, power supply.


Enclosure Architecture


Including cabinet size, network cable layout, indicator specifications, uninterruptible power supply, voltage specifications (48V DC or 220V civil AC) and a series of problems.


Server architecture


Large web site because of the large size of server procurement, most of the use of custom servers to replace the purchase of server machine. According to the application needs of the site, custom hard drives, memory, and even the CPU, while removing unnecessary peripheral interface (display output interface, mouse, keyboard input interface), and make space structure conducive to cooling.



Source: Large Web site Architecture technology at a glance high concurrent access and massive data



For reprint Please specify: reproduced from 26 points of the blog



This article link address: High concurrent access and massive data large Web site architecture Technology List



High concurrent access and massive data large Web site architecture Technology List


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.