Tips
Twitter engineers have summarized their experience in building efficient and scalable systems into three tips:
Partitioning, indexing, and replication ).
Partitioning skills
Tweet on Twitter has two main plug-in modes:
By ID and by author.
A single ID-based Key sharding or author-based Key sharding cannot meet both query requirements.
Twitter engineers use this method. One of Tweet's replicate is divided by ID, and the other by author. In this way, queries with by Id go through the replicate sharding by ID; queries with by author go through the replicate sharding by author, which is naturally fast and may not be used across replicate.
The two replicate use different shard plans to adapt to two different query modes. This idea is quite good.
Brief Introduction to Twitter server architecture:
Unicorn: Ruby HTTP server.
Kestrel: Message Queue written in Scala on Twitter.
Flapp: The graph storage flockdb made by Twitter.
Gizzard: A general sharding framework written by Twitter using Scala.
Crane: Migrate data from MySQL to hbase/HDFS.
Scribe: Collect and summarize various logs on each server.
References on the Internet:
1. http://qconlondon.com/london-2009/file? Path =/qcon-london-2009/slides/evanweaver_improvingrunningcomponentsattwitter.20.2.http: // qconsf.com/sf2010/file? Path =/qcon-sanfran-2010/slides/nickkallen_dataubuntureattwitterscale.20.3.http: // strangeloop2010.com/system/talks/presentations/000/014/446/weil-nosqltwitter? 12894289444. http://assets.en.oreilly.com/1/event/29/Fixing_Twitter_Improving_the_Performance_and_Scalability_of_the_World_s_Most_Popular_Micro-blogging_Site_Presentation%20Presentation.pdf5.http://www.youtube.com/watch? V = 9x_ed6gpofq6.http: // response