Abstract: This article aims to analyze how to build large scale distributed systems in the cloud from a technical perspective through the growth of a social app, including scalability of the platform, expansion of the network level, and expansion of data and business levels.
Features that a social app needs to implement
The general social functions, activities, geographic location, exploration features, novelty, video photo sharing, etc. that the user is interested in, and so on, need to provide a plethora of features, so from a technical point of view, the developers need to solve the problem is also very complex.
When a social app is released, with a smaller number of users, a single server can support all of the access pressure and data storage needs, but the Internet application has viral transmission characteristics. An app is likely to face a one-night burst of red, traffic and data volume in a short period of time to show explosive growth, this time will face the situation is hundreds of millions of PV per day, millions of new users and active users, traffic soared to hundreds of trillion per second. These are not supported for an application that only deploys a simple back-end architecture, causing the server to respond slowly or even time out, and when the service is paralysed at peak times, making the backend service completely unusable and the user experience drastically reduced. This article will share a real-world example of how a social application can build a highly scalable back-end system.
Back-end architecture for social app initial deployment resolution
Social apps at first, the backend architecture was relatively simple, initially deployed on top of the underlying network. In front of the first place an nginx server that is bound to a public IP address for load balancing, 3 application servers are placed behind to handle all business requests, and a MySQL database is built at the end.
Building a private network
As the product continues to iterate, the number of users continues to grow, and the amount of data accumulates, the app needs to improve its backend architecture to start building a private network. Users can build their own network topology using a private network-creating routers and private networks, placing the host of subsequent joined hosts on the private network, and effectively working with other user hosts on the cloud platform to achieve 100% two layers of isolation on the network. The host is open to only 80 ports, so the security of the system more than a layer of protection.
In the above architecture diagram, the first is the firewall, followed by the load balancer, and then the router and the private network, many Internet applications have read and write less, this ratio can sometimes reach 8:2, so we start by introducing the cache allocation of the database read pressure. Secondly, the load balancer is introduced to replace the Nginx proxy in the original schema, where the equalizer is mainly used for distributing requests to multiple application servers on the backend, and when one of the application servers is hung off, the load balancer can be automatically isolated.
Business Partitioning and Scaling
As the volume of concurrent visits and data grows, the app first thinks of horizontally expanding Web services. The premise of the horizontal expansion of the business Server is to ensure that each server is stateless, the session information is decentralized to the cache or database storage, to ensure that the request is loaded to any server can be normal processing.
As seen from the previous step, "Building a private network", added a new private network to expand the network layer, where you can use the own image function, the original application server to create a template, and subsequently can be based on this template to quickly start a new host. You can also use the auto-scaling (auto scale-out) feature to dynamically adjust the number of servers based on the load requests from the backend servers.
The backend of a social application will provide many service request interfaces, such as adding friends, refreshing new things, browsing pages, etc., can analyze the time of each interface through the log, the time-consuming but non-important business request to separate Web server for processing, This allows more resources for the main Web server to handle critical business requests.
Service-Oriented Architecture
As the functionality of the product continues to iterate, the business code becomes more complex, the likelihood of failure is increasing, and when a local function problem occurs, the availability of the entire service is affected. At this point, you can build a service-oriented architecture that splits a complete and large service into sub-services, interacting between services through interfaces. As shown in the following:
Social app services are split into four sub-services-news feeds, user profiles, advertising (ads), and Discovery (Explore)-interacting between different services through a message communication framework such as ZEROMQ. The benefits of splitting a large service into several small sub-services are self-evident, mainly:
- Fault Isolation: sub-service failure will not affect the overall situation, such as advertising business problems will not make the entire app can not be used, can still see new things;
- Independent extension: each split sub-service has different access pressures, such as new calls compared to some two-level pages of user data is much higher, so the former will be allocated more Web servers;
- Standalone Deployment: The configuration of a large service can be very complex because of the excessive function, and once split, the configuration items can be customized according to different characteristic requirements, thus improving manageability;
- Team Collaboration Development: developers have their own mastery of the direction, so as to improve the development efficiency;
- Abstract data access: when subsequent data-plane (database, cache) extensions are made, the data service of the sub-service can be modified to enable transparency of the underlying.
Database replication
Business growth will also bring many problems to the database, when the initial schema of a single database (the database simultaneously provides read and write) is insufficient to support the app access pressure, the first need to do data copy replication. The common MySQL, MongoDB and other databases are available on the market replication function, in MySQL, for example, from the high level, replication can be divided into three steps:
- Master changes the record to binary log (these are called binary log events, binary logs event);
- The slave copies the binary log events of master to its trunk log (relay log);
- Slave redo the events in the trunk log and change the data to reflect its own.
The first part of implementing this process is the master record binary log. Master records These changes in binary logging before each transaction updates the data. MySQL writes the transaction serially to the binary log, even if the statements in the transaction are cross-executed. After the event is written to the binary log, master notifies the storage engine to commit the transaction.
The next step is to slave copy the binary log of master to its own trunk logs. First, slave starts a worker thread--i/o thread. The I/O thread opens a normal connection on master and then starts Binlog dump process. Binlog dump process reads the event from the binary log of master, and if it has been followed by master, it sleeps and waits for master to produce a new event. The I/O thread writes these events to the relay log.
The SQL slave thread handles the final step of the process. The SQL thread reads events from the log, updating the slave data so that it is consistent with the data in master. As long as the thread is consistent with the I/O thread, the trunk log is typically located in the OS cache, so the overhead of the trunk log is minimal.
In addition, there is a worker thread in master: As with other MySQL connections, slave opening a connection in master will also cause master to start a thread. The replication process has a very important limitation-replication is serialized on slave, meaning that parallel update operations on Master cannot operate concurrently on slave.
For cloud users, you only need to know the IP and port of the database to use it. Concrete implementation See:
The first step is to expand the slave, the stand-alone master into the Master+3 platform slave architecture, and in which the slave build an intranet load balancer (load Balancer), for the top-level data service, As long as the configuration of a MySQL master node and a lb node, in the future due to business changes to increase or decrease slave is completely transparent to the upper layer.
This approach can bring two benefits, the first is to improve availability, if a master error, you can upgrade a slave as master to continue to provide services to ensure data availability, the second is to share reading pressure, for a social app, Read-write separation is the first step in the data-tier optimization, using the schema above can easily be read to the request to share the query on the MySQL slave, and write left to master. However, there is a database consistency problem when reading and writing, that is, synchronizing to slave after the data is written to master has a lag time, which is acceptable for social applications, as long as the final consistency of the data is guaranteed.
At the bottom, there is a snapshot, which is a regular cold backup of the data, which differs from the slave that simply replicates MySQL master, because the online bug or misoperation deletes the data on master. This will immediately synchronize to the slave resulting in data loss when cold backup snapshot will play a role in data protection.
Monitoring is required during operation, and users can use tools on Linux to perform statistical analysis top/iotop/df/free/netstat and other tools to monitor the operation of each service and component in the system, as well as log information (HTTP access log/appl Ication log/database slow log) analyzes performance bottlenecks for each service.
Data Partitioning and Scaling
The next business adjustment is to partition and expand the database. First, building a cache cluster, referencing the memcached cache in the starting schema, is a stand-alone database cache. When the volume of data grows, it is necessary to spread the data across multiple cache servers, often with the hashring algorithm, and the advantage is that only a small portion of the data is invalidated, whether it is adding nodes or deleting nodes. You can also refer to NoSQL databases, where Redis uses data from social data that is not strong for relational requirements but is highly demanding on query efficiency from MySQL to Redis. Redis is especially well-suited for storing list-class data, such as buddy relationship lists, leaderboard data, and more.
In addition to the data partitioning can be considered for MySQL the first step is to split vertically, the original separate database according to the function module split into: Friends new, user data, advertising data and exploration data. For Redis also, the original single Redis according to the function module split into four, respectively: leaderboard data, friends, advertising data, exploration data.
The next bottleneck is a single-table too large problem, this time we need to do a horizontal split-split a table into multiple tables, you need to select a partition key, such as the user table to do the split, usually choose users ID. The main choice of partition key is to see all the query statements frequently use which query field, select that field as the partition key to ensure that most of the query can fall on a single data table, a small number of non-partitioned key query statement, you may want to traverse through all the segmented data table.
Build a complete test environment
Building a full test server requires creating a new router and private network, a separate network environment and bandwidth resources, an intranet GRE tunnel through the router, a VPN dial-in network, and SSH Key management.
This process allows you to create a All-in-one environment that contains all of the system services, making it your own image. If you follow your team to new people, you need a separate, complete development environment, you can quickly create a host based on your own image, or use the User Data customization feature to initialize the environment by executing a script that you upload at the host startup. You can combine these two functions to make an image of all the services you need to install, and update the code from the code base with the user data script. Because the code changes more frequently than the environment, it is not possible to build a new own image every time the code is updated. In this way, a complete test server is built so that each engineer can have its own independent test server.
What do I need to connect to the online environment when the app is released? The two networks themselves are completely 100% isolated, can take advantage of the function of the GRE tunnel, two routers, to achieve the test environment network and online production environment network fully connected.
Multi-engine room deployment and hybrid networking
In order to make the backend architecture more reliable and business more stable, it is necessary to implement multi-machine room deployment and hybrid networking. The specific reasons are as follows three points:
- Disaster recovery: in a complex network environment, the computer room may appear network conditions, resulting in some of the more critical business of the availability of reduced, backup room can guarantee that the service will not have a significant long-time interruption;
- load sharing: A single room may not be sufficient to support the entire request, at this time can be part of the pressure to share the request to another room;
- Accelerated area Access: in the domestic network environment, the south and the North each other network access between the high latency. Accelerate access for regional users through long machine room deployments.
As shown above, there are three rooms, the middle is Qingcloud Beijing 1 District Room, responsible for the main business. On the left is the Asia Pacific 1 room, mainly serving the Asia Pacific and overseas customers. Both rooms use the Qingcloud private network deployment, using routers to communicate via GRE tunnels or IPSec encrypted tunnels. If the security requirements of the data transmission process is high, you can use the IPSec method to get through the two rooms, then access can only be accessed through the intranet IP. On the right is the office room where engineers develop in this environment.
In the implementation of hybrid networking, as long as the computer room routers or network width devices support the standard GRE Tunneling Protocol, IP Tunneling Protocol, the traditional physical world can connect the computer room and the router, and finally open up public cloud environment. Common scenarios for multi-machine room deployment include these:
The main engine room full set of business in the off-site re-construction, and do not need to provide online services, only in the main room failure to switch to the standby room, the deployment is relatively simple. But there are two shortcomings, one is the cost is higher, need double the cost and only used to do cold backup, usually completely not used; In addition, when the main engine room suddenly hung up, standby room to start to provide services, data need to warm up, this is very slow process, may appear slow service response, or even can not provide services.
From easy to difficult there are three stages: first, reverse proxy, user request to the second room, but do not do any processing is turned to the first room this will be the delay between the two places have certain requirements. Second, in the second room to deploy application server and cache, most of the data requests can be read from the cache, do not need to cross-room request, but when the cache fails, still fall to the first room database to query. So, this is not a very thorough approach; third, the deployment of a full service, including HTTP servers, business servers, caches, and database slave. This way to enter the second room request, only in the room can complete the request processing, faster, but will encounter data consistency and cache consistency problem, there will be some solutions to this point. In addition to inconsistencies in the data synchronization process, there is a need to face caching.
A good system architecture is not designed, but evolved.
Building a stable and reliable business system requires attention to the following:
- Analyze user behavior and understand your business, such as social, e-commerce, and video;
Different business has different industry attributes and characteristics, for social, the more typical feature is the large amount of data, data query dimension, such as Query June 11-July 15 at xx Cafe All my friends took photos of the person, the query criteria including the friend dimension, photo dimension, location dimension, privacy status dimension, etc. At this point it is necessary to do a reasonable expansion of the data level.
E-commerce is characterized by regular large promotions, which will require a large number of computing resources, application servers to carry traffic spikes, at this time can take advantage of the elastic cloud platform to achieve rapid expansion of business, and in their own business pressure, promotion to temporarily invoke API interface, and autoscaling extended back-end computing resources. Video business has very obvious traffic peak and low peak period, peak traffic is usually daytime or everyone at night from work home that period of time, 2 o'clock to 6 o ' clock in the morning when the traffic is very low, can take advantage of cloud computing elasticity, to invoke the API way to adjust the business bandwidth resources, so as to achieve cost-saving purposes.
- Reasonable planning system, estimate system capacity, such as 10w/100w/1000w PV (DAU): Different system capacity may be corresponding to the deployment of different architectures, find the most suitable for their own one;
- The system is scalable that can be scaled horizontally;
- Spare no effort to solve the single point problem;
- Design for Failure:app backend architecture for errors The development expenditure is to prepare for the various problems that may arise, such as offsite backup, etc.
- Design service-oriented architecture, split-molecule system, API interaction, asynchronous processing;
- Building ubiquitous caches: page caching, interface caching, object caching, database caching;
- Avoid over-design, a good system architecture is not designed, but evolved.
How a social app builds a highly scalable, interactive system