Twitter, an evolving architecture

Source: Internet
Author: User
Tags ruby on rails qcon
Document directory
  • Cache
  • Message Queue
  • Memcached Client

 

Twitter, an Evolving Architecture

PostedAbel AvramOn Jun 26,200 9

Community

 

 

Topics

 

Tags

 

,
,
,
,

Evan Weaver, Lead Engineer in the Services Team at Twitter, who's primarily job is optimization and scalability, talked about Twitter's architecture and especially the optimizations completed MED over the last year to improve the web site during QCon London 2009.

Most of the tools used by Twitter are open source. the stack is made up of Rails for the front side, C, Scala and Java for the middle business layer, and MySQL for storing data. everything is kept in RAM and the database is just a backup. the Rails front end handles rendering, cache composition, DB querying and synchronous inserts. this front end mostly glues together several client services, writable written in C: MySQL client, Memcached client, a JSON one, and others.

The middleware uses memcached, varnish for page caching, Kestrel, a mq written in Scala, and a comet server is in the works, also written in Scala and used for clients that want to track a large number of tweets.

Twitter started as a "content management platform not a messaging platform" so many optimizations were needed to change the initial model based on aggregated reads to the current messaging model where all users need to be updated the latest tweets. the changes were done in three areas: cache, MQ and memcached client.

Cache

Each tweet is tracked in average by 126 users, so there is clearly a need for caching. in the original configuration, only the API had a page cache that was invalidated each time a tweet was coming from an user, the rest of the application being cacheless:

The first transaction tural change was to create a write-throughVector CacheContaining an array of tweet IDs which are serialized 64 bit integers. This cache has a 99% hit rate.

The second change was adding another write-throughRow CacheContaining database records: users and tweets. This one has a 95% hit rate and it is using Nick Kallen's Rails plug-in called Cache Money. Nick is a Systems effecect at Twitter.

The third change was introducing a read-throughFragment CacheContaining serialized versions of the tweets accessed through API clients which cocould be packaged in JSON, XML or Atom, with the same 95% hit rate. the fragment cache "consumes the vectors directly, and if a serialized fragment is currently cached it doesn't load the actual row for the tweet you are trying to see so it short-circuits the database the vast majority of times ", said Evan.

Yet another change was creating a separate cache pool for the page cache. According to Evan, the page cache pool uses a generational key Scheme rather than direct invalidation because clients can

Send HTTPs if-modified-since and put any time stamp they want in the request path and we need to slice the array and present them with only the tweets they want to see but we don' t want to track all the possible keys that the clients have used. there was a big problem with this generational scheme because it didn't delete all the invalid keys. each page that was added which corresponding to the number of tweets people were processing ing wocould push out valid data in the cache and it turned out that our cache only had a 5 hour valid life time because all these page caches flowing through.

When the page cache was moved into its own pool, the cache misses dropped about 50%.

This is the current cache scheme employed by Twitter:

Since 80% of the Twitter traffic comes through the API, there are 2 additional levels of cache, each servicing up to 95% of the requests coming from the preceding layer. the overall cache changes, in total between 20 and 30 optimizations, brought

10x capacity improvement, and it wocould have been more but we hit another bottleneck at that point... Our strategy was to add the read-through cache first, make sure it invalidates OK, and then move to a write-through cache and repair it online rather than destroying it every time a new tweet ID comes in.

Message Queue

Since, on average, each user has 126 followers, It means there are 126 messages placed in the queue for each tweet. beside that, there are times when the traffic peaks, as it was during Obama's inauguration when it reached several hundreds of tweets/second or tens of thousands messages into the queue, 3 times the normal traffic at that time. the MQ is meant to take the peak and disperse it over time so they wocould not have to add lots of extra hardware. twitter's MQ is simple: Based on memcached protocol, no ordering of jobs, no shared state between servers, all is kept in Ram and it is transactional.

The first implementation of the MQ was using Starling, written in Ruby, and did not scale well especially because Ruby's GC which is not generational. that lead to MQ crashes because at some point the entire queue processing stopped for the GC to finish its job. A demo-was made to port the MQ to Scala which is using the more mature jvm gc. the current message is only 1,200 lines and it runs on 3 servers.

Memcached Client

The memcached client optimization was intended to optimize Cluster load. the current client used is libmemcached, Twitter being its most important user and contributor to the code base. based on it, the fragment Cache Optimization over one year led to a 50x increase in page requests served per second.

Because of poor request locality, the fastest way to deal with requests is to precompute data and store it on network RAM, rather than recompute it on each server when necessary. this approach is used by the majority of Web 2.0 sites running almost completely directly from memory. the next step is "scaling writes, after scaling reads for one year. then comes the multi co-location issue "according to Evan.

The slides of the QCon presentation have been published on Evan's site.

 

 

 

//////////////////////////////////////// //////////////////////////////////////// //////////////////////////////////////// ///

 

 

Twitter is the largest Ruby on Rails application so far. In the past few months, page clicks have increased from 0 to millions, and Twitter is now 10000% faster than today.

Platform
Ruby on Rails
Erlang
MySQL
Mongrel
Munin
Nagios
Google Analytics
AWStats
Memcached

Status
Tens of thousands of users, real and confidential
600 requests per second
Average 800-connections per second, with a peak value of connections
MySQL processes 2,400 requests per second
180 rails instances, using mongrel as the Web Server
One MySQL Server (one big 8 Core Box) and one slave for read-only statistics and reports
30 + processes are used to process other tasks
8 sun x4100s
Rails processes a request within 200 milliseconds
The average time spent in the database is 50-milliseconds
Memcached exceeding 16 GB

Architecture
1. Common scaling problems
2. At first, Twitter had no listening, no graphs, and no statistics, which made it very difficult to solve the problem. Later, Munin and Nagios were added. It is a little difficult to use tools on Solaris. Although Google Analytics is available, the page is not loading, so it is useless.
3. memcached is widely used for caching.
-For example, if it is very slow to get a count, you can throw it into memcached within 1 millisecond.
-It is complicated to obtain the status of a friend, which has other problems such as security. Therefore, the friend's status is updated and thrown into the cache instead of making a query. No access to the database
-The activerecord object is large, so it is not cached. Twitter stores the critical attributes in a hash and loads the attributes when the access is delayed.
-90% of requests are API requests. Therefore, no page or fragment cache is performed on the front end. The page is very time-sensitive and inefficient, but Twitter caches API requests
4. Messages
-A large number of messages are used. The producer produces messages and puts them in a queue, and then delivers the messages to the consumers. Twitter serves as a message bridge between different forms (SMS, web, Im, etc.)
-Use DRB, which means distributed Ruby. There is a library that allows you to send and receive messages from remote Ruby objects through TCP/IP, but it is a little fragile
-Moved to Rinda, which is a sharing queue using the tuplespace model, but the queue is persistent and the message will be lost when the failure occurs.
-Tried Erlang.
-Move to Starling, a distributed queue written in ruby
-Distributed queues are written to hard disks to save system crashes. Other large websites use this simple method.
5. SMS is processed by using the API of a third-party gateway, which is very expensive.
6. Deployment
-Twitter made a review and launched a new mongrel server. There is no elegant method yet.
-If the mongrel server is replaced, an internal error is thrown to the user.
-The server is killed once. The rolling blackout method is not used because the message queue status is kept in mongrel, which causes the remaining mongrel to be blocked.
7. Misuse
-The system often goes down because people add anyone as friends and 9000 friends within 24 hours, which will crash the site.
-Build tools to detect these problems so that you can find out when and where these errors occur
-Relentlessly Delete these users
8. partitions
-No partitions are planned in the future. The current changes are sufficient.
-The partition plan is based on time rather than the user, because most requests are local
-Memoization partitioning is difficult. Twitter cannot ensure that read-only operations are actually read-only operations. It is possible to write a read-only slave, which is very bad.
9. Twitter's API traffic is 10 times that of Twitter sites
-The most important thing Twitter does is the API
-Keep the Service simple and allow developers to build better ideas than Twitter itself in Twitter's basic organization. For example, twitterrifle is a beautiful way to use Twitter

What I learned
1. Communicate with the community. Do not hide and try to solve all problems by yourself. Many smart people are willing to help you with your questions.
2. Use your scaling plan as a business plan and gather a group of consultants to help you
3. Build it by yourself. Twitter spent a lot of time trying solutions that others seemed to work on, but failed. It is better to build something by yourself, so that you can at least control it and build the features you need
4. Build on the user's limit. People may try to kill your system. Raise the limit of reason and detection mechanism to protect your system from being killed
5. Do not make the database a primary bottleneck. Not all things require a large join, cache data, and consider other creative ways to get results. A good example is described in Twitter, Rails, Hammers, and 11,000 Nails per Second.
6. You can easily partition your application from the beginning. In this way, you will always have a way to scale your system.
7. It is slow to recognize your system. Add a report immediately to track problems.
8. Optimize the database
-Indexing everything, Rails won't do it for you
-Explain how your query runs, and the index may not be as expected
-A large number of very regular. For example, Twitter stores user IDs and friend IDs together, which prevents a large amount of expensive join operations.
9. cache everything. Some ActiveRecord objects are not cached at present. Search is fast enough
10. Test everything.
-You want to know that it works normally when you deploy it together.
-Twitter now has a complete test suite. So when the cache expires, Twitter can find the problem before go live.
11. Use the exception prompt and exception log to get the error prompt immediately, so that you can find the correct method.
12. Don't be stupid.
-Scaling has changed silly things.
-A single attempt to load 3000 friends to the memory may cause a server crash, but it works well when there are only four friends.
13. Most of the performance is not from the language, but from the application design.
14. Create an API to make your site open to services. Twitter's API is a major reason for its success. It allows users to create an extension and ecosystem. You can never do what your users can do, so that you will not be creative. So develop your system and make it easier for others to integrate their applications with your applications.

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.