App service-Side architecture changes

Last Update:2016-07-12 Source: Internet

Author: User

Tags app service zookeeper

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

With the advent of the mobile internet era, mobile technology has also developed rapidly. Today, apps have become the core channel for most Internet companies to get users. In the past, the PC as the main bearer platform of the various lines of business, a continuous integration into the mobile project, the original product-centric rapid iteration of the single development model, has been unable to deal with this explosion of business access and high-speed growth. At the same time, with the increase of user volume, the continuous explosion of traffic, the system architecture faces a series of challenges and transformation. How to build a high-reliability, high-expansion, low-cost, M.F.B. s system architecture has become the industry's fun and long talk.

Development 2010-2012

The 2010 mobile has just emerged, the company formed a mobile team (just started on a few people) led an outsourcing company made the first edition. At that time, the business is very simple, that is, the PC-side forum articles, such as directly mobile app side to show, the server is all in one structure, there is no concept of architecture (1), although the system coupling is serious, but the low flow is not obvious problems.

Figure 1 Service architecture

2013-2014

2013 the company listed, business expansion, mobile traffic began to grow, especially in 2014.

At the end of 2014, the traffic grew 2.5 times times earlier. The original all-in-one system structure of the shortcomings of the increasingly prominent, the service side often due to peak access to the pressure of downtime. This highly-coupled structure is often caused by an over-limited flow of one interface that causes other interfaces to not be properly accessed.

With the continuous expansion of the business, the application of continuous iterative results in increasing the volume of the application, the project becomes more and more bloated, redundancy increases, maintenance costs ... The old architecture is overwhelmed.

In the face of increasingly serious service pressure, the company began to build its own mobile research and development team (C#+sql Sever), the service-side program was first reconstructed.

Figure 2 Service Architecture

Server-side program for application tiering (interface layer API, business Logic Layer service, data Access Layer DAO), segmentation (according to each module of the app side to separate the original All-in-one structure into different services), do static and dynamic separation CDN acceleration, read and write separation, cluster load. At the same time, the company operations and maintenance department according to our business characteristics of the development of their own CDN services, two-level cache SCS service to further accelerate the application, after this transformation, temporarily alleviate the pressure on the service side.

2014-present

I joined Autohome in early 2015 when the mobile side faced the following problems:

More requests, it should be said that the era of traffic explosion: In September 2015, Autohome Mobile Average daily independent user total visits increased by about 73% compared with September 2014. At the beginning of 2016, mobile PV realized billions of levels.
More dependent resources: Rely on Redis, DB, RPC, HTTP, Mongdb, MQ, dozens of of businesses.
Vertical business coupling is critical: such as posts, article final pages, and comment systems this vertical coupling business often occurs when a comment system hangs up that causes a post or article to eventually be inaccessible.
Operational promotion activities: In order to increase user viscosity, increase user activity, the various business parties and our own operational promotion activities increased a lot.
Hair version fast: Fixed twice a month, multi-version coexistence.
Microsoft technology System: Microsoft fee Service needs a lot of shining white silver, and high-quality. NET engineers are increasingly difficult to recruit, and the ecosystem is not booming.

In order to cope with the surge in traffic, high availability of services and a larger team, all developers focus on the same system of serious conflict, the app side of the plug-in transformation. At the beginning of the 2015, two reconstruction plans were initiated and a transformation was made, embracing Java in full, and the major technical changes were:

Windows→linux, SQL server→mysql, Iis→tomcat, C#→java.

Refactoring

The main points of demand are: articles, comments this vertical business can not be hung off, the business outbreak can be quickly realized, the dependence (multi-business parties, multi-resources) decoupling, failure is absolutely not allowed to interact with each other.

The solution is divided into the following steps:

Decomposition. The first is the team: according to the application of the plug-in division, the service team research and Development staff group, the team only responsible for their own modules, fixed monthly two iterations, each team service independent on-line non-impact. Next is the service structure, including the horizontal expansion: Multi-cluster, multi-engine room deployment to improve concurrency and disaster tolerance; vertical split: Vertical service further split, dependent decoupling; business sharding: Separate deployment According to functional characteristics, such as activity, second kill, push and other physical isolation; horizontal split: Service layering, functional and non-functional separation, Basic core services are separated from non-core services.
Business services, mutual independence, such as consulting, forums, advertising and so on.
No state design. Call link no single point, stateless service, linear expansion (as far as possible not to save state data to the local machine, interface calls to achieve idempotent).
Available for reuse. The granularity of reuse is the abstract service of the business logic, not the details of the service implementation, the service reference depends only on the service abstraction.
Resource tiering. Redis, DB, MQ master-slave design, multi-machine room deployment, guarantee high availability.
Loose coupling, self-protection, anti-avalanche. Cross-service calls are decoupled as asynchronously as possible, and you must set timeouts, queue size, thread pool size, relatively stable service layering for underlying services and volatile processes when calling synchronously, thread pool protection, drop requests (Nginx, TOMCAT) when the server's maximum thread is exceeded. Redis, DB, MQ, Turbo (RPC), httpclient, etc. can automatically downgrade request calls and send exception alarms when there is a problem with back-end resources.
Service isolation can be self-reliant. Services can be degraded, limited to streaming, switchable, can be monitored, white list mechanism.
Each service independent deployment does not affect, the service abnormal automatic fuse, according to each service characteristic to follow the corresponding downgrade strategy. Basic service sinking reusable basic service autonomy, mutual independence, basic service requirements as thin as possible to scale horizontally, physical isolation to ensure stability (such as User Center, product library).
Distinguish the core business. Core business services are as streamlined and stable as possible (the runtime takes precedence to ensure that the main process is completed smoothly and that the secondary process is asynchronous: Log escalation).

Figure 3 Single Service structure

Realize

The architecture of a single service

The app-side request passes through the Access layer (CDN, LVS, Ng/scs), through the interface control layer's setting check (CDN configuration, anti-hijacking configuration, log service configuration, security check ...). Invokes the rest interface published by the API layer and invokes the business implementation of the business logic layer through parameter validation. At the same time, the business logic layer calls the resource service of the resources layer through the Data Interface layer (Sourceinterface source interface service, dbutils database separate sub-table component, ais4j asynchronous request component, Trubo RPC service) to complete the data assembly of this business and complete the call of this business.

Configuration, a zookeeper-based configuration service (Configuration service for the system's various switches of the real-time configuration such as: The source interface of the current-limit fuse threshold value, etc.); Monitor: monitoring service real-time view system anomalies, traffic; Trace: system tracking; log: (log service).

Rpc-trubo Architecture

Figure 4 Turbo (RPC framework) architecture

In order to deal with the service into self-service, at the end of 2015 we fully enable the company's RPC service Trubo
The main features of the framework are: Multilingual server Publishing, support for C # and Java, efficient and transparent RPC invocation, multi-protocol support, zookeeper-based registry for service discovery, client-side soft load and fault tolerance mechanism, service monitoring with spark and tracking trace analysis Integration of Locker (Docker scheduling platform) for service deployment automation.

Service discovery and RPC stability and fault tolerance, mainly two-room deployment zookeeper cluster, the main engine room 5 nodes (leader/follower cluster), other Room 2 nodes (Observer node), to ensure performance and stability The Trubo client server adds a daemon thread, periodically verifies the data consistency of the local cache and zookeeper, and the Trubo client persists the cached service information locally, even if zookeeper hangs or restarts without affecting normal calls Embedded trace client escalation collects distributed trace logs.

Asynchronous request Component Ais4j

Figure 5 Asynchronous request component AIS4JJ

In order to solve the interface and resource dependency problems (the high risk that the source station or Redis, DB, and other resource layers hang up which causes our service to be unavailable), we encapsulate the asynchronous request component ais4j in order to request response time being dependent on the source station. We also embed our fusing current limiting components to decouple the source station.

The introduction of ais4j greatly alleviates the dependence on external resources and improves the usability of the service, but also brings some problems. The company requires that the cache content time limit within 10 minutes, the original time is evenly distributed to the CDN and our level two cache SCS, now join this component in order to meet the requirements of 10 minutes to split the original 10 minutes to ais4j, which needs to increase the system interface back to the source rate (10% or so). This is the time to make a trade-off between request time and system pressure.

Service Monitoring

At this stage, the service decomposition package should not have much problem with traffic growth over time. But in order to ensure the high availability of services, system capacity estimation, fault real-time positioning ... There must be a complete set of monitoring alarm system.

Figure 6 Trace Schema

Figure 7 is the system call tracking service, through the program Java Instrumentation and configuration system to the system program buried point tracking, and then through spark to the tracking log real-time analysis and display.

Figure 7 Monitoring reports

Figure 8 Trace implementations

Trace implementation

The trace ID identifies a unique call tree, and the Transaction ID identifies only one call, one trace call produces four logs, and the trace call tree can be composed of a turbo, local call, HTTP, or other invocation, and the trace client is independent, Turbo one-way dependent trace.

Figure 9 Service call chain diagram

At the same time we bury the Req ID in the request header of the app side, and record each step of the request process (CDN, SCS, backend service, rpc/httpclient call) through the docking of the Req ID and trace ID.

Figure Ten debug mode information display

For faster positioning stability, we preset in the program debug mode, in the intranet environment as long as the requested URL to open debug mode can quickly call up the system calls the resource chain and each step of the program call consumption time.

Alarm implementation

Through Spark's analysis of log records, real-time alarm of program anomaly is realized on our alarm system. Alarm system by SMS, e-mail, the content contains the requested trace ID hyperlink to the report system to achieve real-time view of the anomaly problem.

App service-Side architecture changes

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More