High-concurrency and high-availability architecture of Twitter

Last Update:2014-07-13 Source: Internet

Author: User

Tags flock ruby on rails redis cluster

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Solving Twitter's "problem" is like playing with toys. This is an interesting expansion metaphor. Everyone thinks Twitter is very simple. A cainiao architect can play around with a scalable Twitter, which is just that simple. However, this is not the case. Raffi krikorian, vice president of Twitter engineering, carefully describes the evolution of Twitter in terms of scalability. If you want to know how Twitter works-start from here.

Twitter is growing too fast and everything is just getting through, but Twitter has grown up. It has changed from a small website struggling with Ruby on Rails to a beautiful site driven by services. It is a big change to a service stop.

Twitter now has 0.15 billion million full-ball active users, 300 k qps, and 22 Mb/s traffic. The system processes 0.4 billion million pieces of Twitter data every day, in 5 minutes, the information flowing from Lady Gaga's hand tip is sent to her 31 million followers.

Key points to be listed:

Twitter no longer wants to be a web application, and Twitter wants to be an API that drives mobile clients around the world as the largest real-time interaction tool on the planet.
Twitter mainly distributes messages, rather than producing messages. 300 k qps is read, and only 6000 requests are written per second.
Asymmetry, with a large number of fans, is now a common situation. Messages sent by a single user can be seen by many followers. This is a large fan output, which may be very slow in delivery. Twitter tries to ensure that it is within five seconds, but it cannot always achieve this goal, this situation is more and more common, especially when pushing between celebrities or celebrities. One possible consequence is that the reply is received before the original message is seen. Twitter is working to meet the challenge of reading tweets written by high-profile users.
The data on your homepage is stored in a cluster consisting of more than 800 redis nodes.
Learn more about you from the people you followed and the Twitter link you clicked. The following two-way settings do not exist.
Users are concerned about Twitter content, but Twitter content has almost nothing to do with its infrastructure.
A very complex monitoring and debugging system is required to track performance issues in the complex protocol stack. Traditional Legacy problems have always plagued the system.

How does Twitter work? Find out through Raffi's wonderful speech...

Challenges

Reliable Implementation of 1.5 million online users and 300 k qps (homepage and search access), slow response?
Reliable implementation is a SELECT statement for all tweets, and the response is too busy.
Data sector output solution. When receiving a new push, you need to find out where to send it, so that it can be read more quickly and easily. do not perform any logic computing on the read operation, and write performance is much slower than reading, 4000 QPS.

Internal components

The Platform service department is responsible for the scalability of Twitter's core infrastructure.
- What they run provides services for timeline, Weibo, users, and social networks, including all the system devices that support the Twitter platform.
- Unified use of the same API by internal and external customers.
- Registration Support for millions of third-party applications
- Support product teams to focus on products without system support concerns.
- We are committed to capacity planning, building scalable backend systems, and so on. By constantly changing facilities, we can achieve unexpected results for our websites.
Twitter has an architect department responsible for Twitter's overall architecture and Research on technological improvement routes (they want to stay ahead ).

Push and pull Modes

Users post content on Twitter every moment. Twitter works to plan how to organize the content and send it to its fans.
Real-time is a real challenge. Presenting messages to fans within five seconds is the current goal.
- Delivery means content is sent to the internet and received as quickly as possible.
- Shipping puts the time data into the storage stack, pushes notifications, and triggers emails. iOS, BlackBerry, and Android mobile phones can all be notified, and there are also text messages.
- Twitter is the largest message sender in the world.
- Recommendation is a huge motivation for generating and rapidly spreading content.
There are two main timelines: user and homepage.
- The user's timeline specifies the content sent by the user.
- The homepage timeline is all the content that you pay attention to in a period of time.
- The online rule is as follows: @ others are isolated if they are not followed by @ users, and replies to a forward can be filtered out.
- In this way, Twitter is a challenge to the system.
Pull Mode
- Targeted timeline. APIS like the Twitter.com homepage and home_timeline. You request it to obtain the data. Many pull requests: get data from Twitter through rest API requests.
- Query the timeline and search API. Query and return all matching tweets as quickly as possible.
Push mode
- Twitter runs a maximum real-time event system with a egress bandwidth of 22 Mb/s.
  - Establish a connection with Twitter and it will push all messages within 150 milliseconds to you.
  - There are about 1 million connections in the PUSH Service Cluster at almost any time.
  - Send messages to the egress like search, and all public messages are sent in this way.
  - No, you cannot. (In fact, it cannot be handled so much)
- User stream connection. The Mac versions of tweetdeck and Twitter both go through this. When you log on, Twitter will view your social graphs, push messages to those you are interested in, and recreate the timeline of the home page, instead of using the same timeline during persistent connections.
- Query API. When Twitter receives a continuous query, the system will send this push to the corresponding connection only if a new Twitter release meets the query conditions.

In a high opinion, the timeline Based on the pull (pull method) is as follows:

A short message (Tweet) is passed in through a write API. Through load balancing and a TFE (Short Message front-end), and some other unmentioned facilities.
This is a very direct path. Calculate the timeline of the home page in advance. All business logic is executed when the short message enters.
Then, the fan-out (sending short messages to the outside) process starts to process. The incoming short messages are placed on a large number of redis clusters. Each message is copied in three copies on three different machines. A large number of machine failures occur on Twitter every day.
Fanout queries the flock-based social graph service. Flock maintains a list of followers and followers.
Flock returns a social map to the receiver, and then begins to traverse all the timelines stored in the redis cluster.
The redis cluster has several TB of memory.
Connect the 4 K destination at the same time.
Use the native linked list structure in redis.
Suppose you send a short message and you have 20 k fans. To fan out the background process, find the location of the 20 k users in the redis cluster. Then it injects the Short Message ID into all these lists. Therefore, every time you write a short message, there are 20 k write operations across the entire redis cluster.
The ID of the short message, the user ID of the initial short message, and four bytes are stored to identify whether the short message is resending, replying, or something else.
The timeline of your home page is located in the redis cluster, with 800 records long. If you flip many pages backward, you will reach the upper limit. Memory is the limit on resources that determines how long your current short message set can be.
Each active user is stored in the memory to reduce latency.
Active users are Twitter users who have logged on within the last 30 days. This standard will change based on the usage of Twitter cache.
Only the timeline of your homepage is stored on the disk.
If you fail on the redis cluster, you will enter a rebuild process.
Search for the social map service. Find the person you are interested in. Query disks for each person and put them into redis.
MySQL uses gizzard to process disk storage. gizzard abstracts SQL transactions and provides global replication.
By copying three times, when a machine encounters a problem, you do not need to re-build the timeline on that machine in each data center.
If one short message is forwarded by another, a pointer to the original short message is stored.
When you query the timeline of your homepage, the timeline service will be queried. The timeline service only finds the machine where your timeline is located.
Efficiently run three different hash Loops because your timeline is stored in three places.
They find the fastest first and return the result as quickly as possible.
The compromise is that fan-out takes more time, but the read process is fast. It takes about two seconds from the cold cache to the browser. It takes about 400 ms to call an API.
Because the timeline only contains short message IDs, they must "synthesize" these short messages to find the text of these short messages. Because a group of IDS can be obtained in multiple ways, short messages can be obtained from T-bird in parallel.
Gizmoduck is a user service and tweetypie is a short message object service. Each service has its own cache. User cache is the basic information of all users in a memcache cluster. Tweetypie stores short messages about the last one and a half months in the memcache cluster. These are exposed to internal users.
There will be some read filtering at the boundary. For example, if Nazi content is filtered out in France, there is a work of stripping the content when it is read before it is sent.

Advanced Search

In contrast to pull, all calculations are executed while reading, which makes writing easier.
When a tweet is generated, ingester performs a syntax analysis on it to find out everything about the new index, and then transmits it to an early bird machine. Early bird is a modified version of Lucene, and indexes are stored in the memory.
In the distribution process of Twitter, it may be stored in multiple homepages determined by the number of fans. one tweet is saved to only one early bird machine (excluding backups ).
Blender performs distributed query across data centers on the timeline. It queries each early bird for the content that matches the query condition. If you search for "New York Times", all parts will be queried, and the results will be sorted, merged, and re-sorted after they are returned. Sorting is based on social metrics, which are based on the number of forwarding, favorites, and comments.
The interaction information is written, and an interaction timeline is created here. When you add a favorite or reply to a tweet, the modification to the interactive timeline is triggered. Similar to the homepage timeline, it is a series of active IDs, including favorite IDs and new reply IDs.
All these are sent to blender for re-calculation, merging, and classification on the read path. The returned results are what is seen on the Search timeline.
Discovery is a custom search based on your related information. Twitter learns about your information through the people you follow and the links you open. This information is applied to the Discovery search, this information is also used for sorting.

Search and pull are opposite.

Search and pull obviously look very similar, but they are opposite in some attributes.
At home timeline:

Write operation. Tweet write operations cause O (n) processes to be written to the redis cluster. N represents your fans. If so, it takes 10 s to process data from Lady Gaga or millions of fans of Obama, that is unacceptable. All redis clusters support hard disk data processing, but they are generally operated in Ram.
Read operation. Find the redis machine through the API or network available O (1) time. Twitter has made a lot of Optimizations in searching for the home page path. The read operation can be completed in 10 milliseconds. All said Twitter dominated the consumption mechanism rather than the production mechanism. 0.3 million requests and 6000 RPS write operations can be processed per second.

When searching for timeline:
- Write operation. A Tweet request is received by ingester and processed by an early bird machine. During write operations, O (1). A Tweet takes 5 seconds to process, including waiting in queue and searching for paths.
- Read operation. The read operation triggers O (n) Cluster read operations. Most people do not need to search, so they can be more effective in storing tweets, but they have to spend time. It takes 100 milliseconds to read, and search does not involve hard disk operations. All Lucene index tables are put into RAM, which is more effective than hard disk.
The content of tweet has almost nothing to do with infrastructure. T-bird stores is responsible for all the tweets. Most of the tweet content is processed in Ram. If no memory is available, use SELECT query to capture the content in the memory. Hone timeline does not care about the content only involved in search, trends, or what's happening pipelines.

Outlook:

How to make the channel faster and more effective?
Fanout can slow down. It can be adjusted to less than 5 seconds, but sometimes it cannot work. This is very difficult, especially when there is a celebrity Tweet.
Twitter's attention is also very asymmetric. Tweet only provides information that is followed for a specified period of time. Twitter pays more attention to your information, because you have followed Lance Armstrong, but he hasn't followed you. Because there is no mutual concern, social connections are more suggestive.
The problem is a huge base. @ Ladygaga has 31 million followers. @ Katyperry: 28 million. @ Barackobama: there are 23 million followers.
When some of these people send Weibo posts, the data center needs to write a large number of meager people to each of them. When they started chatting, this was a big challenge, and it was happening all the time.
These high-profile fanout users are Twitter's biggest challenges. The reply is always visible until the celebrity sends a message to the user. This causes disorder of the entire website. It takes several minutes for Lady Gaga to focus on users, so the author sees that this meager time is at different points in time. It may take about five minutes for some people to see this meager image, which is far behind others. Early users may have received a list of replies from a meager user. At this time, the reply is still being processed, and fanout is still proceeding, so the reply is added. This occurs before the delayed receiver receives the original meager value. This can lead to a large number of user confusion. Prior to the issuance of meager values, they were sorted by ID, which led to a monotonous increase, but that scale could not solve the problem. The high-value fanouts queue is being backed up.
Try to find a solution to merge read and write paths. Users who are not paying much attention to the distribution are meager. For people like Taylor Swift, they will not be able to process anything extra. They only need to merge their time points during reading. Balance the Read and Write paths. 10 s of computing resources can be saved.

Decoupling

Tweet removes Association in many ways, mainly through mutual decoupling. Searching, pushing, email interest groups, and homepage timelines can work independently of each other.
F The system has been decoupled for performance reasons. Twitter used to be completely synchronized and ended two years ago due to performance reasons. It takes 145 microseconds to extract a tweet to the tweet interface, and then all clients are disconnected. This is a historical issue. The write path is a ruby-driven program using MRI, a single-threaded service. Each time an independent woker is assigned, the processor is exhausted. They want to release client connections as quickly as possible. A tweet is in. Ruby processes it. Insert it into the queue and disconnect it. They only run around 45-48 processes/boxes. So they can only process the same number of tweets/boxes in parallel, so they want to disconnect as quickly as possible.
Tweets is switched to the asynchronous path mode. We discuss that everything has been removed.

Monitoring

The dashboard around the office shows the system running status at any given time.
If you have 1 million followers, it will take several minutes to distribute all the tweets.
Tweets entry statistics: 400 mb of Tweets per day. The average daily rate is 5000, the daily peak rate is 7000 per second, and the event rate exceeds 12000 per second.
Timeline distribution statistics: 30B distribution/day (about 21 m/minute); 3.5 s @ p50 (50% distribution rate) distributed to 1 m; 300 K distribution/second; @ P99 it takes about 5 minutes.
A monitoring system named viz monitors each cluster. The average request time for the timeline service to retrieve data in the data cluster is 5 ms. @ P99 takes 100 milliseconds. @ P99.9 it takes several hundred milliseconds to request a disk.
Zipkin is a system based on Google dapper. It can be used to detect a request, view each service that he clicks, and request time, so that each part of the performance details of each request can be obtained. You can also drill down to obtain detailed information about each individual request in different time periods. Most of the time is spent debugging the system and viewing what is consumed by the request time. It can also display aggregation statistics of different latitudes, for example, to view how long fanout and distribution have taken. It takes about two older projects to reduce the active user timeline to 2 ms. Most of the time is used to overcome GC pauses, perform memchache queries, understand the structure of the data center topology, and finally create such clusters.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More