Sina Weibo platform architecture for billions of users

Source: Internet
Author: User

Preface

Sina Weibo announced in March 2014 the monthly active users (MAU) has reached 143 million, the first minute of the New year in 2014 to send 808,298 micro-blog, so large user size and volume of business, the need for high availability (HA), high concurrent access, low latency powerful back-end system support.

Weibo platform the first generation of architecture is the lamp architecture, the database uses MyISAM, the background is PHP, the cache is memcache.

As the scale of application grows, the second generation of architectures is modular, serviced and modular, and the backend system is replaced with Java from PHP, which gradually forms the SOA architecture, which supports the business development of Weibo platform for a long time.

On this basis, after a long period of reconstruction, online operation, thinking and precipitation, the platform formed a third-generation architecture system.

Let's look at a core business diagram of Weibo (below), isn't it very complicated? But this is already a simplified business diagram that can no longer be simplified, and the third-generation technology system is designed to guarantee the rapid, efficient and reliable release of new product features in the core business of Weibo.

Third generation technology system

The third-generation technology system of Weibo platform uses the orthogonal decomposition method to establish the model: in the horizontal direction, adopt the typical three-level hierarchical model, namely interface layer, service layer and resource layer; In the vertical direction, further subdivided into business architecture, technology architecture, monitoring platform and service governance platform. Here is the overall architecture diagram of the platform:

As shown, the orthogonal decomposition method decomposes the entire graph into 3*4=12 regions, each representing the intersection of a horizontal dimension with a vertical dimension, which defines the core functional points of the region, such as Region 5, which mainly completes the service layer's technical architecture.

Below is a detailed introduction to the horizontal and vertical design principles, with particular emphasis on the technical components in 4, 5, and 6 and their role in the entire architecture system.

Horizontal layering

The Division of horizontal dimension is very basic in the design of large and medium-sized Internet back-Office system, which is embodied in every generation of technology system of the platform. Here is a brief introduction, for the follow-up of the vertical dimension of the explanation to pave the way:

    1. Interface layer is mainly implemented with the Web page, mobile Client interface interaction, define a unified interface specification, the platform's core three interface services are content (Feed) services, user relations services and communication services (single private messages, Mass, group chat).
    2. Service layer is the core business of the modular, service, and here are divided into two types of services, a class of atomic services, which is defined as a service module that does not rely on any other services, such as the commonly used short-chain services, the service of the number of services belong to this category. The diagram uses Swimlane isolation to indicate their independence. The other is the combination of services, through a combination of various atomic services and business logic to complete the service, such as feed services, communication services, they in addition to their own business logic, but also rely on short-chain, user and the number of service.
    3. The resource tier is primarily the storage of data models, including common cache resource Redis and memcached, as well as persistent database storage for MySQL, HBase, or Distributed File System TFS and Sina S3 services.

Horizontal stratification has a feature, the dependency is from the top down, the upper layer of service depends on the lower layer, the lower level of service will not rely on the upper layer, to build a simple and direct dependency relationship.

In contrast to the layered model, the servers in the microblog system consist of three main types: the front-end machine (which provides API interface services), the queue machine (which handles upstream business logic, mainly data writing) and storage (MC, MySQL, McQ, Redis, hbase, etc.).

Vertical Extension Technology Architecture

With the development and optimization of the business architecture, platform development realizes many excellent middleware products, which are used to support the core business, which is generated by the business, and as the technical components become more and more rich, form a complete platform technology framework, greatly improve the platform's product development efficiency and business stability.

Different from the level of the upper layer dependent on the relationship between the vertical direction of the technical framework as the Foundation support point, driving to both sides of the business structure, monitoring platform, service governance platform, the following is the core components.

Interface Layer Web V4 framework

The interface framework simplifies and regulates the development of business interfaces, packages common interface layer functionality into a framework, and uses spring's aspect-oriented (AOP) design philosophy. The interface framework is based on Jersey for two development, based on annotation definition interface (URL, parameter), built-in auth, frequency control, access log, downgrade function, support interface layer monitoring platform and service governance, as well as automated bean-json/xml serialization.

Service Layer Framework

The service layer mainly involves the RPC remote call framework and the Message Queue framework, which is the two most widely used platform in the service layer.

MCQ Message Queuing

Message Queuing provides a first-in, first-out communication mechanism, within the platform, the most common scenario is to write the data landing operations asynchronously to the queue, the queue handler bulk read and write to the DB, Message Queuing provides an asynchronous mechanism to speed up the response time of the front-end machine, and second, the batch of DB operations indirectly improve the performance of the DB Another application scenario, the platform provides real-time data to search, big data, and business operations through Message Queuing.

The MCQ (simplequeue service over Memcache) Message Queuing service, which is used extensively within the microblogging platform, is based on the Memcache protocol, and the message data is persisted to BerkeleyDB, only get/set two commands, and is also very easy to monitor (stats queue), a rich client library that runs on the line for many years, with performance much higher than the generic MQ.

Motan RPC Framework

Motan RPC Service, the underlying communication engine uses the Netty Network framework, the serialization protocol supports Hessian and Java serialization, the communication protocol supports Motan, HTTP, TCP, MC, etc., Motan framework is used internally, in the robustness of the system and service governance , there are more mature technology solutions, robustness, configuration management service based on config implementation of the high availability and load balance policy (support flexible failover and failfast ha policy, and round Robin, LRU , consistent hash and other load balance policies), service governance, generate complete service invocation chain data, service request performance data, response time (Response times), QPS, and standardized error, exception log information.

Resource-level framework

There are a lot of frameworks for the resource layer, there are key-list dal middleware that encapsulates MySQL and HBase, a custom counting component, and a proxy that supports distributed MC and Redis, in which the industry has more experience to share, and here I share the Platform Architecture Object Library and SSD Cache component.

Object Library

Object Library supports convenient serialization and deserialization of object data in Weibo: When serializing, serializing objects in the JVM memory into HBase and generating a unique objectid, when access to the object is accessed through Objectid, the object library supports any type of object, supports PB, JSON, binary serialization protocol, the largest application scenario in Weibo defines the video, pictures, and articles referenced in Weibo as objects, altogether defines dozens of object types, and abstracts out the standard object metadata schema, which is uploaded to the object storage System (Sina S3). The object metadata is saved in Sina S3.

Ssdcache

With the popularity of SSD drives, the superior IO performance makes it more and more used to replace the traditional SATA and SAS disks, there are three kinds of common application scenarios: 1) Replace the hard disk of MySQL database, there is no MySQL version optimized for SSD, even so, Direct upgrade of SSD drives can also bring about 8 times times the ioPS boost, 2) Replace the Redis hard drive, improve its performance, 3) in the CDN, speed up static resource loading.

Micro-Bo platform to the application of SSD in the distributed cache scene, the traditional REDIS/MC + MySQL mode, extended to REDIS/MC + SSD cache + MySQL mode, SSD cache as L2 cache use, the first to reduce the Mc/redis cost is too high, The problem of small capacity also solves the database access pressure caused by the penetration of DB.

Vertical Monitoring and service governance

As service size and business become more complex, and even business architects can hardly accurately describe the dependencies between services, the management operations of services become more difficult, in this context, referring to Google's dapper and Twitter Zipkin, The platform realizes its own large-scale distributed tracking system Watchman.

Watchman large distributed Tracking system

Like other large and medium-sized Internet applications, the microblogging platform consists of a number of distributed components, the user through the browser or mobile client every HTTP request to reach the application server, will go through many business systems or system components, and leave footprints (footprint). But these scattered data can be of limited help in troubleshooting, or process optimization. For such a typical cross-process/cross-threading scenario, it is particularly important to aggregate and analyze such logs. On the other hand, the collection of performance data for each footprint and the flow control or demotion of each subsystem according to the strategy are also important factors to ensure high availability of the microblog platform. To be able to track the full invocation link for each request, collect performance data for each service on the call chain, track all error and exception in the system, and then return to the control flow by computing performance data and performance metrics (SLAs) Based on these goals, the watchman system of Weibo was born.

One of the core principles of this system design is low intrusion (NON-INVASIVENSS): As a non-business component, it should be as little as possible to invade or not invade other business systems, maintain the transparency of users, can greatly reduce the burden of developers and access threshold. Based on this consideration, all log acquisition points are distributed in the technical framework middleware, including interface framework, RPC framework, and other resource middleware.

Watchman by the technical team to build a framework, application in all business scenarios, operation and maintenance based on this system to improve the monitoring platform, business and operation of the common use of this system to complete the distributed service management, including service expansion and contraction, service degradation, traffic switching, Service release and grayscale.

End

Now, the technical framework is playing an increasingly important role in the platform, driving the platform of Technology upgrading, business development, system operation and maintenance services, this article is limited to space limitations, no introduction, follow-up will continue to introduce the core middleware design principles and system architecture.

Source: Infoq

Sina Weibo platform architecture for billions of users

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.