Objective
There are a lot of articles on the Internet similar to the course I want to share today, there are architects writing, there are operations and writing, there are some development, emphasis points are different, today I take our operations to fully explain the angle.
A mature website architecture is not designed to be high availability, high scalability, high performance, etc., it is the infrastructure is gradually robust as the number of users and lines of business continue to grow. In the early stages of development, generally from 0 to 1, will not come up on the whole chatty structure, and few people so capricious.
Description
applicable business: e-commerce/portal/recruitment website
Development language: PHP and Java
Web Services: Nginx/tomcat8
Database: Mysql
Operating System: Centos
Physical server: Dell r730/r430
One, Single server deployment
Project development completed on-line, few user visits.
Two, web and database Independent deployment
There is a certain amount of user access, a single server performance a little hard, want to improve concurrency, add a server, the HTTP request and SQL operating load scattered different servers.
Third, static and dynamic separation-initial
What is static and dynamic separation? Static pages are separated from the dynamic page deployment.
Iv. database Master-slave and query cache
Rediscache
Use Redis Cache database query results to put hot data into memory, improve query speed, and reduce database requests.
MySQL Master
Asynchronous replication based on Binlog.
HA
Mysql:keepalived
How to guarantee the Redis cache timeliness?
a) to increase the middleware, in the master-slave synchronization delay time, the middleware will also route SQL read operations to the master.
b) after the master-slave synchronization delay time, the phase-out cache is initiated asynchronously.
c) increase the message queue and clean the cache program, the storage is also written to the message queue, cache cleaner subscription Message queue, once there is data updates, re-cache.
d) the item in the cache must set the expiration time.
Five or seven-tier load balancing, shared storage, and Redis high availability
More and more traffic, single server performance is unable to support, so increase load balancing, horizontally expand the Web node, while adjusting static and dynamic separation.
Seven-layer load balancing
Forwards different upstream depending on the domain name or suffix.
NFS Network File System
Shared storage holds website programs or static resources.
Redis Master-Slave
Static and dynamic separation-mid-term
HA
Lb:keepalived
Nfs:drbd+heartbeat
Redis:sentinel/keepalived
Usession How does session hold?
A) Source IP Hash
b) Session sharing
c) Session Sticky (sticky session)
D) Session Replication
Vi. Database schema Extension
Traffic up, SQL operations naturally more, a single database read performance to reach the bottleneck, the response is very slow, the business read more write less, need to improve read performance, consider extending the database schema.
A master more from
Based on Binlog asynchronous replication, multiple master libraries are synchronized from the library.
Read/write separation
A) The code logic layer distinguishes between read and write libraries.
b) The use of middleware agents, the SQL parsing differentiated processing, open source mainstream: Atlas, Mycat and so on.
Sub-Library, sub-table, partition
Sub-Library: Separate related tables to different databases according to business type, such as Web, BBS, blog, etc.
Table: Thousands records on a single table, which takes a long time, uses vertical splits and splits horizontally, and stores the data across different small tables.
Partitioning: Divided into chunks based on table fields, which can be distributed on different disks.
The above is mainly distributed disk I/O pressure, improve processing performance.
Load Balancing from library layer four
When multiple from the library, using LVS to achieve load balancing, the program provides VIP, access transparent.
HA
Main Library and lb:keepalived from library
VII. SOA service-oriented architecture
Soa
Service-oriented architecture design concept, split bloated program architecture, decomposition of core business units, service, modularization, distributed deployment.
Service governance
Using the Dubbo distributed framework to govern SOA services, Dubbo provides high-performance and transparent RPC remote invocation scenarios.
Configuration Center
Use the Zookeeper Storage Service connection information.
Message Queuing
Use the RABBITMQ decoupling service to guarantee direct service communication.
Viii. DNS rotation and database full-text search engine
DNS Polling
The principle of DNS load balancing technology is to configure multiple IP addresses on the DNS server with the host name, and when the user accesses, rotation returns the parse record for load balancing purposes.
Full-Text search engine
Like the e-commerce website home page will have query form, when the commodity and many varieties, the relational database is huge, want to quickly from the database to accurately retrieve the product that the user wants to be obvious.
Introduce full-text search engine, set up index cache, quickly query massive data, alleviate database pressure, open source mainstream: ElasticSearch, Sphinx.
Nine, static cache server
Each request for static resource load will fall on the Web node and NFS storage, and these resources are very little change, we cache these resources to the upper level, the request arrives first to determine whether there is a cache, if there is a direct return, thereby reducing the back-end HTTP requests, the response will be much faster.
Ten, Distributed File system and CDN
Distributed File System
When there are many pictures and videos, NFS is limited in processing efficiency and storage capacity, then it is more appropriate to use Distributed File System (DFS), DFS is a NAS storage architecture, C/s mode, multiple inexpensive servers to form a storage cluster, providing high performance, high availability, high scalability and other features. The client mounts locally, accessing the remote server file as if it were accessing the local file system.
Cdn
Every request for static resources will fall on the web node and storage, and these resources are very little change, if you put these resources into the site portal, would not reduce the back-end of a large number of HTTP requests, what is the method?
Using CDN Technology, it distributes frequently accessed resources (mainly static) through a cache technology to the edge servers across the country, the user first accesses the CDN server, and the CDN returns the cache server in the nearest network according to the functional DNS, if the cache server has the static resources of the cache request to return directly, Otherwise the return source station gets returned, which improves website access speed and reduces back-end server pressure.
Xi. four-tier load balancing and NoSQL databases
Four-layer Load balancing
Seven layer load Balancing to analyze the application layer protocol, the efficiency is not four layer high, some scenarios do not need to analyze the application layer protocol, only want to implement the forwarding load, then four load balancing is preferred.
Of course, it is also possible to have a four-tier proxy seven-tier load balancer that extends seven-tier load balancing.
NoSQL Database
Because of the large number of individual SQL queries, can not be optimized in depth, you may consider the use of NoSQL non-relational database, it is to solve large-scale, high concurrency, large data volume and so on. But it is more suitable for unstructured data storage, such as detail page content, raw data, etc.
12. Now
Elastic Scaling
Automatic expansion, node demotion.
Micro-Service
Fine-grained split applications for service, lightweight, automated deployment, and more.
Memory
Disk data is handled as much as possible in memory.
Remote disaster recovery
If the intolerable Web site is not available, you should take into account offsite backup or offsite dual live.
Contingency plan
13. Talk about ancient times so far
Try to intercept requests in front, thereby reducing database and HTTP requests
The database layer is a schema bottleneck that needs to be carefully designed, such as schema expansion, SQL optimization (compression, indexing, etc.)
Avoid a single point
Decomposition pressure
Scalability
Find a solution to the bottleneck
13. Contingency plan
SRE: Website Reliability Engineer
It is their mission to ensure that the site does not crash!
The preparation of contingency plans is outlined in the following steps:
1. System grading
In accordance with the importance of business systems, such as the order service hanging, will affect the user can not place orders, it is necessary to invest more resources to protect, such as the management of the background hanging, will not affect the user, according to the business division of different levels, the implementation of different quality assurance and cost investment.
2. Full link analysis
Comb from the site portal to the various aspects of data storage, to find the dependency of services, hypothetical to analyze the problem, if a link failure, the extent of the impact.
3, all-round monitoring
Implement comprehensive monitoring of related links, including basic resource monitoring, service status monitoring, interface monitoring, log monitoring and so on, to ensure that problems are traceable.
4. Make contingency plans
Think more about the feasibility of the scheme, do not regularly carry out contingency planning exercises, verify the correctness and controllability of the plan and grasp the recovery time.
14. Coping Strategies
Network access Layer:
A) Machine room failure: Remove the room from the DNS rotation or switch to another room
b) VIP Network exception: switch backup VIP
Agent Layer:
A) IP current limit: Some IP access too large causes the back-end load pressure too high; implement IP current limit
b) Back-end application anomalies: such as hardware and software failure, remove the abnormal node; If a room problem switches to another room
Application tiers and Service tiers:
A) Service exception: A service access timeout, slow response, removal of services or switch to normal service
b) The program thread pool is not sufficient: The thread pool settings are too small to cause requests to accumulate; provide parameter switches, such as dynamically adjusting the thread pool size
c) The request volume is too large: the request volume is too large, exceeds the actual processing power, request current limit or set the request threshold to automatically expand the node
Cache layer and data tier:
A) Redis hangs: master-slave switchover
b) MySQL hangs: master-slave switchover, post-switchover verification
c) Machine room failure: Switch Cache library/database to other room
On the evolution of large-scale website architecture from the perspective of operation and maintenance