On the evolution of large-scale website architecture from the perspective of operation and maintenance

Last Update:2017-08-03 Source: Internet

Author: User

Tags message queue switches

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Objective

There are a lot of articles on the Internet similar to the course I want to share today, there are architects writing, there are operations and writing, there are some development, emphasis points are different, today I take our operations to fully explain the angle.

A mature website architecture is not designed to be high availability, high scalability, high performance, etc., it is the infrastructure is gradually robust as the number of users and lines of business continue to grow. In the early stages of development, generally from 0 to 1, will not come up on the whole chatty structure, and few people so capricious.

Description

applicable business: e-commerce/portal/recruitment website

Development language: PHP and Java

Web Services: Nginx/tomcat8

Database: Mysql

Operating System: Centos

Physical server: Dell r730/r430

One, Single server deployment

Project development completed on-line, few user visits.

Two, web and database Independent deployment

There is a certain amount of user access, a single server performance a little hard, want to improve concurrency, add a server, the HTTP request and SQL operating load scattered different servers.

Third, static and dynamic separation-initial

What is static and dynamic separation? Static pages are separated from the dynamic page deployment.

Iv. database Master-slave and query cache

Rediscache

Use Redis Cache database query results to put hot data into memory, improve query speed, and reduce database requests.

MySQL Master

Asynchronous replication based on Binlog.

Mysql:keepalived

How to guarantee the Redis cache timeliness?

a) to increase the middleware, in the master-slave synchronization delay time, the middleware will also route SQL read operations to the master.

b) after the master-slave synchronization delay time, the phase-out cache is initiated asynchronously.

c) increase the message queue and clean the cache program, the storage is also written to the message queue, cache cleaner subscription Message queue, once there is data updates, re-cache.

d) the item in the cache must set the expiration time.

Five or seven-tier load balancing, shared storage, and Redis high availability

More and more traffic, single server performance is unable to support, so increase load balancing, horizontally expand the Web node, while adjusting static and dynamic separation.

Seven-layer load balancing

Forwards different upstream depending on the domain name or suffix.

NFS Network File System

Shared storage holds website programs or static resources.

Redis Master-Slave

Static and dynamic separation-mid-term

Lb:keepalived

Nfs:drbd+heartbeat

Redis:sentinel/keepalived

Usession How does session hold?

A) Source IP Hash

b) Session sharing

c) Session Sticky (sticky session)

D) Session Replication

Vi. Database schema Extension

Traffic up, SQL operations naturally more, a single database read performance to reach the bottleneck, the response is very slow, the business read more write less, need to improve read performance, consider extending the database schema.

A master more from

Based on Binlog asynchronous replication, multiple master libraries are synchronized from the library.

Read/write separation

A) The code logic layer distinguishes between read and write libraries.

b) The use of middleware agents, the SQL parsing differentiated processing, open source mainstream: Atlas, Mycat and so on.

Sub-Library, sub-table, partition

Sub-Library: Separate related tables to different databases according to business type, such as Web, BBS, blog, etc.

Table: Thousands records on a single table, which takes a long time, uses vertical splits and splits horizontally, and stores the data across different small tables.

Partitioning: Divided into chunks based on table fields, which can be distributed on different disks.

The above is mainly distributed disk I/O pressure, improve processing performance.

Load Balancing from library layer four

When multiple from the library, using LVS to achieve load balancing, the program provides VIP, access transparent.

Main Library and lb:keepalived from library

VII. SOA service-oriented architecture

Soa

Service-oriented architecture design concept, split bloated program architecture, decomposition of core business units, service, modularization, distributed deployment.

Service governance

Using the Dubbo distributed framework to govern SOA services, Dubbo provides high-performance and transparent RPC remote invocation scenarios.

Configuration Center

Use the Zookeeper Storage Service connection information.

Message Queuing

Use the RABBITMQ decoupling service to guarantee direct service communication.

Viii. DNS rotation and database full-text search engine

DNS Polling

The principle of DNS load balancing technology is to configure multiple IP addresses on the DNS server with the host name, and when the user accesses, rotation returns the parse record for load balancing purposes.

Full-Text search engine

Like the e-commerce website home page will have query form, when the commodity and many varieties, the relational database is huge, want to quickly from the database to accurately retrieve the product that the user wants to be obvious.

Introduce full-text search engine, set up index cache, quickly query massive data, alleviate database pressure, open source mainstream: ElasticSearch, Sphinx.

Nine, static cache server

Each request for static resource load will fall on the Web node and NFS storage, and these resources are very little change, we cache these resources to the upper level, the request arrives first to determine whether there is a cache, if there is a direct return, thereby reducing the back-end HTTP requests, the response will be much faster.

Ten, Distributed File system and CDN

Distributed File System

When there are many pictures and videos, NFS is limited in processing efficiency and storage capacity, then it is more appropriate to use Distributed File System (DFS), DFS is a NAS storage architecture, C/s mode, multiple inexpensive servers to form a storage cluster, providing high performance, high availability, high scalability and other features. The client mounts locally, accessing the remote server file as if it were accessing the local file system.

Cdn

Every request for static resources will fall on the web node and storage, and these resources are very little change, if you put these resources into the site portal, would not reduce the back-end of a large number of HTTP requests, what is the method?

Using CDN Technology, it distributes frequently accessed resources (mainly static) through a cache technology to the edge servers across the country, the user first accesses the CDN server, and the CDN returns the cache server in the nearest network according to the functional DNS, if the cache server has the static resources of the cache request to return directly, Otherwise the return source station gets returned, which improves website access speed and reduces back-end server pressure.

Xi. four-tier load balancing and NoSQL databases

Four-layer Load balancing

Seven layer load Balancing to analyze the application layer protocol, the efficiency is not four layer high, some scenarios do not need to analyze the application layer protocol, only want to implement the forwarding load, then four load balancing is preferred.

Of course, it is also possible to have a four-tier proxy seven-tier load balancer that extends seven-tier load balancing.

NoSQL Database

Because of the large number of individual SQL queries, can not be optimized in depth, you may consider the use of NoSQL non-relational database, it is to solve large-scale, high concurrency, large data volume and so on. But it is more suitable for unstructured data storage, such as detail page content, raw data, etc.

12. Now

Elastic Scaling

Automatic expansion, node demotion.

Micro-Service

Fine-grained split applications for service, lightweight, automated deployment, and more.

Memory

Disk data is handled as much as possible in memory.

Remote disaster recovery

If the intolerable Web site is not available, you should take into account offsite backup or offsite dual live.

Contingency plan

13. Talk about ancient times so far

Try to intercept requests in front, thereby reducing database and HTTP requests

The database layer is a schema bottleneck that needs to be carefully designed, such as schema expansion, SQL optimization (compression, indexing, etc.)

Avoid a single point

Decomposition pressure

Scalability

Find a solution to the bottleneck

13. Contingency plan

SRE: Website Reliability Engineer

It is their mission to ensure that the site does not crash!

The preparation of contingency plans is outlined in the following steps:

1. System grading

In accordance with the importance of business systems, such as the order service hanging, will affect the user can not place orders, it is necessary to invest more resources to protect, such as the management of the background hanging, will not affect the user, according to the business division of different levels, the implementation of different quality assurance and cost investment.

2. Full link analysis

Comb from the site portal to the various aspects of data storage, to find the dependency of services, hypothetical to analyze the problem, if a link failure, the extent of the impact.

3, all-round monitoring

Implement comprehensive monitoring of related links, including basic resource monitoring, service status monitoring, interface monitoring, log monitoring and so on, to ensure that problems are traceable.

4. Make contingency plans

Think more about the feasibility of the scheme, do not regularly carry out contingency planning exercises, verify the correctness and controllability of the plan and grasp the recovery time.

14. Coping Strategies

Network access Layer:

A) Machine room failure: Remove the room from the DNS rotation or switch to another room

b) VIP Network exception: switch backup VIP

Agent Layer:

A) IP current limit: Some IP access too large causes the back-end load pressure too high; implement IP current limit

b) Back-end application anomalies: such as hardware and software failure, remove the abnormal node; If a room problem switches to another room

Application tiers and Service tiers:

A) Service exception: A service access timeout, slow response, removal of services or switch to normal service

b) The program thread pool is not sufficient: The thread pool settings are too small to cause requests to accumulate; provide parameter switches, such as dynamically adjusting the thread pool size

c) The request volume is too large: the request volume is too large, exceeds the actual processing power, request current limit or set the request threshold to automatically expand the node

Cache layer and data tier:

A) Redis hangs: master-slave switchover

b) MySQL hangs: master-slave switchover, post-switchover verification

c) Machine room failure: Switch Cache library/database to other room

On the evolution of large-scale website architecture from the perspective of operation and maintenance

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More