This article is a technical summary of learning the architecture of large distributed websites. Provides a high-performance, highly available, scalable, scalable distributed Web site with an overview of the architecture and gives an architectural reference
This article is a technical summary of learning the architecture of large distributed websites. A high-performance, highly available, scalable, extensible distributed Web site is described in the framework, and an architectural reference is given. Part of the Reading notes, part of the personal experience summary. It has a good reference value to the large-scale distributed website architecture. (if it feels helpful to everyone, please help me with some recommendations, thank you.) This blog will be gradually launched a series of large-scale distributed Web site architecture, design patterns, architectural patterns of the range of articles, Exchange groups: 466097527)
This sharing outline is as follows
- Features of large websites
- Large Site Architecture Goals
- Large Web site architecture mode
- High Performance architecture
- Highly Available architecture
- Scalable architecture
- Extensible architecture
- Security architecture
- Agile architecture
- Examples of large architectures
First, the characteristics of large-scale website
- User-wide, widely distributed
- High-flow, highly concurrent
- Massive data, high availability of services
- Vulnerable to cyber attacks because of poor security environment
- More features, faster, frequent releases
- From small to large, progressive development
- User-centric
- Free service, paid experience
Ii. Large-scale website architecture objectives
- High performance: Provides a fast access experience.
- High Availability: Web services are always accessible.
- Scalable: improves/lowers processing power through hardware increase/decrease.
- Security: Provides Web site security access and data encryption, secure storage and other policies.
- Extensibility: Easy to add/remove new features/modules by adding/removing methods.
- Agility: On-demand, fast response;
Three, large-scale Web site architecture model
- Stratification: Generally can be divided into, application layer, service layer, data layer, management layer, analysis layer;
- Segmentation: Generally according to business/module/function characteristics of the division, such as the application layer is divided into the home page, User Center.
- Distributed: Deploy applications separately (for example, multiple physical machines) and work collaboratively through remote calls.
- Cluster: One application/module/function deploys multiple (e.g. multiple physical machines), providing external access through load balancing.
- Caching: Speed up access by placing data at the nearest location from the app or user.
- Async: Asynchronously synchronizes a synchronous operation. The client makes a request, does not wait for the server to respond, and then notifies the requester using a notification or a poll when the server has finished processing. General refers to: request--response--notification mode.
- Redundancy: Increase replicas, improve availability, security, performance.
- Security: Have an effective solution to known issues and establish discovery and defense mechanisms for unknown/potential problems.
- Automation: Will be repetitive, do not need to manually participate in things, by means of tools, using the machine to complete.
- Agility: Proactively accepting changes in demand and responding quickly to business development needs.
Four, high-performance architecture
User-centric, providing a fast Web access experience. The main parameters are short response time, large concurrent processing ability, high throughput, stable performance parameters.
Can be divided into front-end optimization, application layer optimization, Code layer optimization, storage layer optimization.
Front-End optimization: the part before the website business logic;
Browser optimization: Reduce the number of HTTP requests, use browser cache, enable compression, Css JS location, JS asynchronous, reduce cookie transmission;
CDN acceleration, reverse proxy;
Application-Layer Optimization: a server that processes Web services. Using caching, Async, clustering
Code optimization: Reasonable architecture, multi-threading, resource reuse (object pool, thread pool, etc.), good data structure, JVM tuning, single case, cache, etc.
Storage optimization: Cache, SSD, optical transmission, optimized read and write, disk redundancy, distributed storage (HDFS), NoSQL, etc.;
V. High-availability Architecture
Large websites should be accessible at all times. Normal provision of external services. Because of the complexity of large sites, distributed, inexpensive servers, open source databases, operating systems and other characteristics. It is difficult to ensure high availability, which means that site failures are unavoidable.
How to improve usability is a problem that needs urgent solution. First, you need to think about availability from the architecture level and at the time of planning. The industry typically uses a few 9 indicators of availability. For example, four 9 (99.99), the allowable unavailability time within a year is 53 minutes.
Different levels of policy use, redundant backup and failover are generally used to solve high-availability issues.
Application layer: The general design is stateless, for each request, the use of which server processing is not affected. General use of load balancing technology (need to solve the session synchronization problem), to achieve high availability.
Service layer: Load balancing, hierarchical management, fast failure (timeout setting), asynchronous invocation, service demotion, idempotent design, etc.
Data layer: Redundant backup (cold, hot standby [synchronous, asynchronous], Win Bei), fail-over (acknowledgement, transfer, recovery). The well-known theoretical basis for data high availability is the CAP theory (persistence, usability, data consistency [strong consistency, user consistency, and eventual consistency])
Six, scalable architecture
Scalability is the ability to increase/decrease the processing power of the system by adding/reducing hardware (server), without changing the original architecture design.
Application layer: Vertical or horizontal segmentation of the application. Then load balance for a single function (dns,http[reverse proxy],ip, link layer).
Service layer: Similar to the application layer;
Data layer: Sub-Library, sub-table, NoSQL, etc. common algorithm hash, consistent hash.
Vii. Extensible Architecture
It is easy to add/Remove function modules, providing good scalability at the code/module level.
Modular, component: Cohesion, internal coupling, improve reusability, extensibility.
Stable interface: Define a stable interface, the internal structure can be "random" change in the case of the interface unchanged.
Design Patterns: Application of object-oriented ideas, principles, the use of design patterns, code-level design.
Message Queuing: A modular system that interacts with Message Queuing to decouple dependencies between modules.
Distributed services: Public Module service, provide other system use, improve reusability, extensibility.
VIII. Security Architecture
Have an effective solution to known issues and establish discovery and defense mechanisms for unknown/potential problems. For the security problem, we must first raise the security consciousness, establish a security effective mechanism, from the policy level, the organization level guarantees. For example, the server password can not be leaked, password updated monthly, and three times can not be repeated, weekly security scan. To strengthen the construction of the security system in an institutionalized manner. At the same time, we need to pay attention to safety-related aspects. Security issues cannot be overlooked. Including infrastructure security, application system security, data security and so on.
Infrastructure security: Hardware procurement, operating system, network environment security. General use, the formal channel to buy high-quality products, choose a safe operating system, timely repair the vulnerability, install anti-virus software firewall. Guard against the virus, back door. Set up firewall policy, set up DDoS defense system, use attack detection system, sub-network isolation and other means.
Application security: When developing a program, use the correct way to solve a known common problem, at the code level. Prevent cross-site scripting attacks (XSS), injection attacks, cross-site request forgery (CSRF), error messages, HTML annotations, file uploads, path traversal, and more. You can also use the Web application firewall (for example: modsecurity) for security vulnerability scanning and other measures to enhance application-level security.
Data security: Storage security (presence in reliable devices, real-time, scheduled backups), preservation of security (important information to encrypt the preservation, selection of suitable personnel for complex preservation and detection, etc.), transmission security (to prevent data theft and data tampering);
Commonly used encryption and decryption algorithm (single hash encryption [Md5,sha], symmetric encryption [DES,3DES,RC]), asymmetric encryption [RSA] and so on.
Ix. Agility
Website architecture design, operation and maintenance management to adapt to change, provide high scalability, high scalability. Easy to respond to rapid business development, sudden increase in traffic access requirements.
In addition to the architectural elements described above, it is also necessary to introduce the idea of agile management and agile development. Make business, product, technology, operation and maintenance unified, on-demand, rapid response.
X. Examples of large architectures
The above using seven-layer logic architecture, the first layer of customer layer, the second layer of the front-end optimization layer, the third layer of application layer, the fourth layer of service layer, the fifth layer of data storage layer, the sixth layer of large data storage layer, the seventh layer of data processing layer.
Customer Tier: Supports PC browser and mobile app. The difference is that the mobile app can directly access the reverse proxy server through IP access.
Front-end layer: Use DNS load Balancing, CDN local acceleration and reverse proxy service;
Application layer: Web application cluster, vertical split according to business, such as product application, Member center, etc.
Service layer: The provision of public services, such as user services, order services, payment services, etc.;
Data layer: Support relational database cluster (support read-write separation), NoSQL cluster, Distributed File System cluster, and distributed cache;
Big Data storage layer: Support the application layer and service layer of log data collection, relational database and NoSQL database structured and semi-structured data collection;
Large data processing layer: Offline data analysis via MapReduce or storm real-time data analysis, and the processed data into a relational database. (In practice, offline data and real-time data are categorized according to business requirements and stored in different databases for use by tiers or service tiers).
Sharing is a joy and a process of personal growth. The article is generally their own study summary, work experience, deficiencies are unavoidable, please correct me, common progress. Established an architecture-centric KK Group 466097527, Welcome to join us. Focus on large-scale distributed Web site architecture, big data, architectural patterns, design patterns.
Discussion on the technical summary of large-scale distributed website architecture