The road to high scalability (tweet)----A tweet to support the architecture of 150 million active users, 300,000 QPS, 22MB per second firehose, and 5 Seconds of push information

Source: Internet
Author: User
Tags flock ruby on rails redis cluster

Original link: http://highscalability.com/blog/2013/7/8/the-architecture-twitter-uses-to-deal-with-150m-active-users.html

Written on July 8, 2013, translated as follows:

The toy-like approach to "solving the challenges of Twitter" is a common metaphor for extensibility. Everyone thinks Twitter is easy to achieve. With a little bit of knowledge of the system architecture, we can build a Twitter, that's it. But it's not that simple, according to Raffi Krikorian, VP of the Twitter Software Development department, in the very detailed speeches at timelines in scale. If you want to know exactly how Twitter works, start now:

The growth of Twitter is gradual, so that you may overlook their growth. It started out as just a three-layer Ruby on Rails site and then slowly became a reliable service-oriented core site that tried to connect Twitter when we wanted to determine if the network was a problem. How big a change Ah!

Twitter now has 150 million active users worldwide, processing 300,000 requests per second to generate the timeline, and a firehose that can output 22MB per second. This system will accept 400 million messages per day. Within 5 minutes, a message from Lady Gaga would appear in front of her 31 million fans.

Some highlights:

1. Twitter no longer wants to be just a website app. It wants to be a global Mobile client API, one of the largest real-time messaging systems on the planet.

2. Twitter is primarily a consumer mechanism, not a production mechanism. There are only 6,000 write requests per second, but there are 300,000 read requests for the timeline.

3. There are more and more famous people who have huge numbers of fans. Such a person sends a message that needs to be spread to all of his fans, and the speed may be slowed. Twitter is trying to keep it under control for less than 5 seconds. But this is not always successful, especially when more and more celebrities are forwarding messages to each other. One consequence is that it is possible to reply earlier than the original message was received. Twitter is trying to make some changes to high-value users by moving part of the work from write requests to read requests.

4. Your own timeline home page exists in a Redis cluster and can have up to 800 records.

5. Twitter can know a lot of things from who you are and what connections you click. When the two sides do not have mutual powder, can be based on the implicit social contract to speculate a lot of things.

6. Although users are concerned about tweets, the content of these messages has little to do with Twitter's infrastructure.

7. One such complex stack requires a very complex monitoring and debugging system to track the root cause of performance problems. And the old decisions that are now out of time will often continue to affect the system.

So, how does Twitter work? You can learn from the outline of this wonderful Raffi speech below.

Challenges:

1. A simple implementation can be slow in the presence of 150 million of users and 300,000 of QPS on the timeline (including home page and search).

2. In fact, Twitter has tried a simple implementation consisting of a large number of SELECT statements, but it won't work.

3. The solution is a write-based diffusion process. When a message arrives, there are a number of steps that determine where the message should be. This will be easy and quick to read, and does not require any calculations. This approach results in a slower write speed than the read speed, with write performance around 4,000 QPS.

Organizational structure:

1. The Platform services Group is responsible for the core scalable infrastructure of Twitter:

A. They are responsible for timeline services, messaging services, User Services, social graph services, and all services that support the Twitter platform.

B. The API for internal and external customers is almost the same.

C. More than 1 million third-party APIs with apps registered to use Twitter

D. Protection of product groups does not need to be too large for product size and worry

E. Capacity planning, design of a scalable back-end system to replace the infrastructure in a timely manner as the site develops in a planned direction.

2. Push a unique architecture group that is responsible for the overall architecture of Twitter, maintaining a technical liability book (The technology they want to remove)

push? Extraction?

1. People record content on Twitter every moment of the day. Twitter's job is to find a way to send this content in sync and send it to your fans.

2. Real challenges come from real-time. The goal is to have a message delivered to a user within 5 seconds.

A. Delivery means collecting content at the fastest speed, sending it to the Internet, and then downloading it from the Internet.

B. Delivery means that reminders, emails, and text messages from iOS, BlackBerry and Android are sent to a timeline cluster running in memory.

C. Twitter The average number of messages sent per active user is the highest in the world.

D. Elections in the United States are one of the topics that produce the most interactive content.

3. Two main timeline: User timeline and home page timeline

A. A user timeline is all messages sent by a user.

B. A homepage timeline is a temporary combination of the user timeline of all the people you care about.

C. Some business logic is used here. For example, a person's response that you don't care about will be blocked. Messages that are sent repeatedly by a user are also filtered out.

D. It is challenging to do these things on a scale such as Twitter.

4. Based on the extraction message:

A. A specific time axis. Examples include the twitter.com and Home_timeline APIs. The message sent to you must be because you requested to receive them. Extract message-based delivery: You request this data from a Twitter application via a REST API.

B. Query the timeline. Search API. Query the database. Return all messages that match a specific query as soon as possible.

5. Based on the push message:

A. Twitter runs one of the largest real-time event systems that can push messages through Firehose at 22MB per second:

-Open a communication pipeline that connects Twitter, and they can send all public messages to you within 150 milliseconds.

-At any one time, about 1 million such communication pipelines are connected to the push cluster.

-Use a friehose client like a search engine. All public messages are sent out of these communication pipelines.

-It's not true. (Because you can't accept the fact.) )

B. User information flow connection. TweetDeck and Mac versions of Twitter are driven in this way. When you log in, they will check your social graph and then just send you the message of the person you're following and recreate the page timeline. Unlike actively extracting messages, you get the same timeline through a persistent connection.

C. Query API. Send a query for all messages. When a newly generated message conforms to this query, they are sent through the communication pipeline registered for the query.

The overall performance of the timeline based on extracting messages:

1. The message enters Twitter via a write API. It will pass some load balancers as well as a TFE (Twitter front end) and some of their internal services.

2. This is a very straightforward path. Fully pre-calculated page timeline. All business logic is executed when the message comes in.

3. The expansion process then occurs. The message comes in and is placed in a Redis cluster. A piece of information is backed up three times on three different machines. On a scale like Twitter, there are a lot of machines going wrong in a day.

4. The extension process will query the social graph service sent to flock. Flock maintains a list of fans and followers.

A. Flock returns a social graph of a recipient, and then repeats this step for all the timelines stored in the Redis cluster.

B. Redis cluster has some terabytes of RAM.

C. 4,000 destinations can be queried at the same time.

D. The Redis native list structure is used.

E. Let's say you have 20,000 fans and then you send a message. The extension will query all 20,000 users ' addresses in the Redis cluster, and then plug the ID of the message into those users ' lists. So the Redis cluster performs 20,000 insertions for each message you write.

F. They will insert the ID of the message, the user ID that generated the message, and 4 characters to indicate that it is a forward, a reply, or something else.

G. Your homepage The timeline is also in the Redis cluster, with 800 records. If you are patient enough to page back, you will find this limitation. RAM is the main limitation that affects how many records the timeline can save.

H. Each active user is in RAM to reduce latency.

I. Active users refer to those who have logged in to Twitter within 30 days. This number of days may change due to the busy level of the Twitter site or the size of the cache.

J. If you are not an active user, the messages you send do not go into the cache.

K. Only your page's timeline will need to read and write to the hard disk.

L. If your profile is squeezed out of a redis cluster, you will need to go through a process called rebuilding when you re-enter the Redis cluster:

-Query Social graph service to find out who you care about, from the hard disk all these people are imported into Redis.

-they use gizzard to manage MySQL-based hard disk storage and use it to mask SQL transactions and provide global replication.

M. by replicating three times in a data center, even if one machine fails, they do not need to regenerate all the timelines on that machine.

N. If a message is forwarded, a pointer to the original message is stored.

5. When you check your homepage timeline, you use the timeline service. The timeline service only needs to find a machine that records the timeline of your home page.

A. Run three different Hashi, because there are three different places in your timeline.

B. They found the first machine they could reach the fastest, and then returned as soon as possible.

C. The cost is that the spread will take a little longer, but the read operation is fast. It takes about 2 seconds to reach the browser from a cold cache. It only takes 400 milliseconds to request from an API.

6. Because the timeline contains only message IDs, they also need to recover these messages to find the corresponding content. Given a bunch of IDs, they can do multiple fetches at once, and get the message content from T-bird in parallel.

7. Gizmoduck is a user service. Tweetypie is the Message object service. Each service has its own cache. The user cache is a memcache cluster with all user data stored. Tweetypie in its own memcache cluster with about half of the messages generated last month. These messages are for internal customers.

8. Sometimes at the exit there will be some filtering at the point of reading. For example, in France filtering Nazi-related content, this content will be removed when sent out.

Overall performance of the search:

1. Different from extracting messages. All calculations are performed at the time of reading, so the write operation is simple.

2. When a message arrives, Ingester splits the message and then finds all the keywords that they want to use for indexing, and then throws the message onto a early bird machine. Early Bird is an improved version of Lucene. The index is present in RAM.

3. At the time of proliferation, a message may exist in the page timeline of n people, depending on how many people follow you. But in early bird, a message is present on a machine (except for replication).

4. Blender creates a search timeline. It needs to be distributed across the entire data center to collect data. It queries every early Bird to see if they have the content to match this query. If you query the "New York Times", all machines will be queried, and the results will be returned, sorted, merged and re-graded. Re-rating is based on social factors, such as the number of forwards, points like and replies.

5. There is an activity timeline, and its activity information is calculated based on write operations. When you click like and reply message, they will update your activity timeline. Similar to the home page timeline, it is the ID of a string of activity events, such as the likes ID, reply ID, and so on.

6. These will be entered in blender, used for recalculation, merging, and sorting in the read operation, and then returning a search timeline that you see.

7. Discovery is a search made based on Twitter's understanding of you. Their knowledge of you comes from a lot of people you care about, clicks on connections, etc., which are used in discovery searches. It also gives the results a re-rating based on the information.

searching and extracting messages is just the opposite:

1. Search and extract messages look very similar, but they have a characteristic opposite to the other.

2. Homepage Timeline:

A. Write operations: When a message arrives, an O (n) step is required to write to the Redis cluster, and N is the number of your fans. For a user such as Lady Gaga and Barack Obama, it means tens the insert operation. All Redis clusters have hard disk backups, and the flock cluster will also have the user timeline on the hard drive. In general, however, the timeline is found in the Redis cluster's RAM.

B. Read operation: Through the API or Web page, only O (1) steps to find the correct Redis machine. Twitter optimizes the read path for the page's timeline, making it very convenient. The read operation takes just over 10 milliseconds to complete. Twitter is primarily a consumption mechanism, not a production mechanism: 300,000 read requests per second correspond to 6,000 write requests per second.

3. Search Timeline:

A. Write operation: When a message arrives at Ingester, only one early bird machine responds. The write operation is O (1). A message can be queued and written to the corresponding early bird machine within 5 seconds.

B. Read operation: When a read operation arrives, it is necessary to do an O (n) read within the entire cluster. Most people don't use search, so they can store messages quickly and wait for a search. But you pay the price when you read it. The read operation takes hundreds of milliseconds. The search does not read content from the hard disk. The entire Lucene index is in RAM, so it is still efficient to read the distributed collection, which is not read from the hard disk.

4. The content of the message is almost not related to most infrastructure. T-bird stores the contents of all messages. Most of the content of a message exists in RAM. If not, then ask T-bird and make a select query to get them back out. The specifics of the message are nothing more than a search, trend, or real-time event pipeline. The homepage timeline doesn't care about the specifics at all.

Future:

1. How can the current information pipeline be made faster and more efficient?

2. Diffusion can be slow. Try to control it within 5 seconds, but not every time it succeeds. It's the whole, especially when celebrities send messages. This situation is also becoming more common.

3. Twitter's attention map is an asymmetric graph. Messages are sent only to people who are concerned at a specific point in time. Twitter can focus your attention on Lance Armstrong but he doesn't care that you know a lot of things. When two-way attention does not exist, many things can be introduced from this implicit social contract.

4. Large-base social graphs are a challenge. @ladygaga has 31 million fans. @katyperry has 28 million fans. @justinbieber has 28 million fans. @barackobama has 23 million fans.

5. When these people send messages, there is much more to be written in the data center. This challenge is even more difficult when they interact (which often happens).

6. These high-proliferation users are the biggest challenge for Twitter. The messages of celebrities are often seen later than their replies. They cause competitive conditions. If a Lady Gaga message takes a few minutes to spread to all of her fans, people will see her news at different points in time. Some people who have recently looked at her may have seen the news 5 minutes earlier than those who followed her a long time ago. We assume that a person received this message early and then replied, if the original message is still in the process of diffusion, then this reply will be inserted in the original message before the spread, first by those who have not received the original message received. People will be very puzzled. Twitter is sorted by ID, because their users are almost always monotonically increasing, but this does not solve this large-scale problem. The proliferation queue for high-value users is always full.

7. Try to find a way to merge the read and write paths. Not diffusing high-value users. People like Taylor Swift do not care about proliferation, but instead can merge her timeline while reading. This allows you to balance read and write paths, saving more than 10% of your computing resources.

Decoupling:

1. Messages are copied many times, mostly between groups that are separate from each other. Groups such as search, push, interest messages, and major timelines can work independently of each other.

2. The system is decoupled for performance reasons. Twitter was previously completely synchronized, but it was stopped two years ago for performance reasons. It takes 145 milliseconds to digest a message into the API, and then all clients are disconnected. This is caused by some historical reasons. The write path is based on Ruby and MRI, a single-threaded server, so the processing power is reduced when a unicorn worker is newly allocated. They want to release a client connection as soon as possible. A message comes in, Ruby handles it, plugs it into a queue, and disconnects. They can only run 45-48 programs on a machine, so a machine can handle so many messages at the same time, so they want to release the connection as soon as possible.

3. Now the message is passed into an asynchronous channel, and then all the services we've talked about are extracting messages from it.

Monitoring:

1. The office has many monitors to display the system running status in real time.

2. If you have 1 million fans, it will take only a few seconds for them to receive your message.

3. Twitter input data: 4 billion messages a day, an average of 5,000 per second, 7,000 per second at peak, and more than 12,000 per second for large events.

4. Timeline delivery data: 30 billion deliveries per day (approximately 21 million times per minute); P50 delivery 1 million fans takes 3.5 seconds, 300,000 deliveries per second, and P99 will reach 5 minutes

5. A system called Viz monitors each cluster. The median value of the time axis service's request time to fetch data from a Scala cluster is 5 milliseconds. The P99 is 100 milliseconds. The p99.9 is hundreds of milliseconds, because they need to be read from the hard drive.

6. Zipkin based on Google Dapper system. They can use it to track a request, see what services the request has gone through, how long it took, and then they can have a very detailed performance report for each request. You can dig deep into each request and understand where all the different times are spent. There is a lot of time to debug this system, to see where the time spent on a request takes place. They can also summarize the data according to different stages, and then observe how long it takes to spread or service. It took them two years to get the active user timeline down to 2 milliseconds. There are many times when you struggle to stop GC, memcache queries, or understand the topology of a data center, or actually set up a cluster to achieve such success.

  

The road to high scalability (tweet)----A tweet to support the architecture of 150 million active users, 300,000 QPS, 22MB per second firehose, and 5 Seconds of push information

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.