Anatomy Twitter "3" Cache = = Cash

Source: Internet
Author: User
Keywords caching HTTP SMS varnish
Tags apache api blog cache caching channel content data
"3" Cache = = Cash





cache = = Cash, caching equals cash income. Although this is a bit exaggerated, the correct use of caching for the construction of large Web sites is a vital event. The response speed of a website in response to a user request is a major factor affecting the user experience. There are many reasons to affect speed, one of the important reasons is the hard disk read/write (disk IO).





Table 1 compares the speed of memory (RAM), HDD (disk), and new Flash (Flash) in reading and writing. Hard drive Read and write, slower than memory millions. Therefore, to improve the speed of the website, an important measure is to cache the data in memory as much as possible. Of course, a copy must be kept on the hard drive to prevent the loss of data in memory in the event of a power outage.





Source: (http://blog.sina.com.cn/s/blog_46d0a3930100fc2v.html)-Anatomy Twitter "3" Cache = = Cash_ Deng kan _ Sina blog Table 1. Storage Media comparison of Disk, Flash and RAM [13]


courtesy Http://farm3.static.flickr.com/2736/4060534279_f575212c12_o.png





, a Twitter engineer, believes that a user-experienced web site should complete a response within an average of 500ms when a user request arrives. And Twitter's ideal is to reach the 200ms-300ms response rate [17]. So on the website architecture, Twitter uses caching on a large scale, multi-level and multiple ways. Twitter's practice of caching and lessons learned from these practices is a big part of the Twitter web architecture.











Figure 2. Twitter Architecture with Cache


Courtesy Http://farm3.static.flickr.com/2783/4065827637_bb2ccc8e3f_o.png





where do I need to cache? Where disk IO is frequent, the more cache is needed.





said earlier, the Twitter business has two core users and text messages (tweets). Around these two cores, there are several tables in the database, the most important of which are three, as shown below. The setting of these three tables is a bystander's guess, not necessarily consistent with Twitter's settings. But ça, believe that even if different, will not be essential difference.





1. User table: User ID, name, login name and password, status (online or not).


2. Short message table: SMS ID, author ID, body (fixed length, 140 words), timestamp.


3. User relation table, record chasing and chasing relationship: User ID, he chases user IDs (following), chasing his user IDs (be followed).




Is it necessary for
to store all these core database tables in the cache? Twitter's approach is to disassemble the tables and put the most frequently read columns in the cache.





1. Vector Cache and Row cache





specifically, the most important column that Twitter engineers think is IDs. That is, the IDs of newly published SMS messages, the IDs of the popular messages that are frequently read, the IDs of the relevant authors, and the IDs of readers who subscribe to these authors. Store these IDs in cache (Stores arrays of tweets Pkeys [14]). In the Twitter literature, the cache space for storing these IDs is called vector cache [14].





Twitter engineers believe that the most frequently read content is these IDs, while the text text is second. So they decided that, under the precondition of giving priority to the resources required for vector cache, the next important task was to set up a row Cache for storing text messages.





The hit rate (Hit Rate or Hit Ratio) is the most important metric for measuring the cache effect. If one or more users read 100 items, 99 of which are stored in the cache, the cache hit rate is 99%. The higher the hit rate, the greater the contribution of the cache.





set the vector cache and row cache, observed the actual operation of the results, found that the vector cache hit rate is 99%, and row cache hit rate is 95%, confirmed by the Twitter engineer earlier bets, the first IDs after the body of the judge.





Vector cache and row cache, the tools used are open source memcached [15].





2. Fragment Cache and Page cache





said that the visit to the Twitter site, not only the browser, but also mobile phones, as well as computer desktop tools such as QQ, as well as a variety of web plug-ins, in order to link other sites to twitter.com [12]. Hosting these two types of users is a web channel with the Apache Web server portal, and a channel called the API. Which API channel to accept the flow of the total flow of 80%-90% [16].





So, following the vector cache and row cache, the Twitter engineers have focused on how to improve the response speed of the API channel by further building the cache.




The main body of the
Reader page shows a message after another. The whole page may be divided into several parts, each local corresponding to a message. The so-called fragment, refers to the part of the page. In addition to text messages, other content such as Twitter logo, etc., is also fragment. If a writer has many readers, caching the Text layout page (Fragment) written by the author will improve the overall reading efficiency of the site. This is the mission of fragment cache.





for some popular authors, readers will not only read his text messages, but also visit his homepage, so it is necessary to cache the personal homepage of these popular authors. This is the mission of page cache.





Fragment cache and Page cache, the tool used is also memcached.





observation of the actual operation results, Fragment Cache hit 95%, and page cache hit only 40%. Page cache has a low hit rate, but its content is the entire personal home page, so it takes up a large amount of space. To prevent page cache from competing for fragment cache space, when physically deployed, the Twitter engineers separated page cache from different machines.





3. HTTP Accelerator





solves the problem of caching the API channel, and then the Twitter engineers are working on the caching of the Web channel. After analysis, they think that the pressure of web channels, mainly from the search. Especially in the face of emergencies, readers will search for relevant text messages, regardless of the text of the author, is not their own "chasing" those authors.





to reduce the search pressure, you may want to search keywords, and its corresponding search results, caching. The caching tool used by Twitter engineers is open source project varnish [18].




The interesting thing about
is that the varnish is typically deployed outside of the Web server and facing the Internet. This way, when a user accesses a Web site, he actually accesses the varnish and reads the desired content. A user request is forwarded to the Web server only if the varnish does not cache the appropriate content. Twitter's deployment, however, is to put varnish on the inside of the Apache WEB Server [19]. The reason is that Twitter engineers have found the varnish operation more complicated, and they have taken this odd and conservative approach in order to reduce the likelihood that the varnish crash would cripple the entire site.





The primary task of Apache Web Server is to parse HTTP and distribute tasks. Different Mongrel Rails server is responsible for different tasks, but most mongrel rails server will have to contact vector cache and row cache to read the data. How does Rails server contact memcached? The Twitter engineers themselves developed a rails plug-in (Gem) called Cachemoney.





Although Twitter did not disclose the varnish hit rate, [17] claimed that after using varnish, the load on the entire twitter.com site dropped by 50%, see Figure 3.








Figure 3. Cache decreases twitter.com load by 50% [17]


courtesy Http://farm3.static.flickr.com/2537/4061273900_2d91c94374_o.png








Reference,





[] Alphabetical List of Twitter Services and applications. (http://en.wikipedia.org/wiki/List_of_Twitter_services_and_applications)


[] How flash changes the DBMS world. (http://hansolav.net/blog/content/binary/HowFlashMemory.pdf)


[Improving] running component of Twitter. (http://qconlondon.com/london-2009/file?path=/qcon-london-2009/slides/EvanWeaver_ImprovingRunningComponentsAtTwitter.pdf)


A high-performance, general-purposed, distributed memory object caching system. (http://www.danga.com/memcached/)


[] Updating Twitter without service disruptions. (http://gojko.net/2009/03/16/qcon-london-2009-upgrading-twitter-without-service-disruptions/)


[] fixing Twitter. (Http://assets.en.oreilly.com/1/event/29/Fixing_Twitter_Improving_the_Performance_and_Scalability_of_the_World _s_most_popular_micro-blogging_site_presentation%20presentation.pdf)


[] varnish, a high-performance HTTP accelerator. (http://varnish.projects.linpro.no/)


[[] How do you varnish in twitter.com? (http://projects.linpro.no/pipermail/varnish-dev/2009-February/000968.html)


[Cachemoney] Gem, an open-source Write-through caching library. (Http://github.com/nkallen/cache-money)





Source: (http://blog.sina.com.cn/s/blog_46d0a3930100fc2v.html)-Anatomy Twitter "3" Cache = = Cash_ Deng kan _ Sina Blog
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.