Anatomy of Twitter "7" as a progressive not thorough

Last Update:2015-03-13 Source: Internet

Author: User

Keywords SMS upload event driven

Tags apache browser business business logic business process cached control data

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

"7" as a progressive

is not a thorough way of working, it is a kind of progress for architecture design.

when a user from a browser requests to reach the Twitter backend, the first one to greet it is the Apache Web Server. The second exit is Mongrel Rails Server. Mongrel is responsible for handling requests for uploads as well as for downloading requests. Mongrel's business logic for uploading and downloading is very concise, but beneath the surface of simplicity, it contains unconventional design. This unconventional design, of course, is not the result of negligence, in fact, this is the most noteworthy highlight of the Twitter architecture.

Figure 9. Twitter Internal flows

courtesy Http://farm3.static.flickr.com/2766/4095392354_66bd4bcc30_o.png

so-called upload, refers to the user wrote a new text message, passed to Twitter for publication. The download, refers to Twitter to update the reader's homepage, add the latest text message. The way Twitter downloads is not a way for readers to make unsolicited requests, but rather the way the Twitter server proactively pull new content to readers. First see upload, mongrel processing upload logic is very concise, in two steps.

1. When mongrel receives a new message, assigns a new SMS ID. The ID of the new SMS, along with the author ID, is then cached into the vector memcached server. Next, the SMS ID and text are cached into the row memcached server. The two cached contents are automatically stored in the MySQL database by vector memcached and row memcached at the appropriate time.

2. Mongrel in the kestrel Message Queuing server, look for each reader and author's message queue, and if not, create a new queue. Next, mongrel the ID of the new message into the queue of all the online readers of the author, and the author himself.

Savor these two steps, feeling that mongrel's work is not complete. One, the message and its associated IDs, cached in vector memcached and row cached is all right, and not directly responsible for the content in the MySQL database. Second, the SMS ID thrown into the Kestrel message queue, announced the end of the upload task. Mongrel no way to inform the author that his text message has been uploaded. and whether or not readers can read new messages.

Why does Twitter take this unconventional,
way of working? Before you answer this question, you may want to take a look at the logic of mongrel processing downloads. Connect the two logic of upload and download, compare, help to understand. Mongrel download logic is also very simple, but also in two steps.

1. The ID of the new SMS is obtained from the Kestrel message queue of the author and reader respectively.

2. Get the text from the row memcached cache. and get readers and authors ' home pages from page memcached, and update these home pages to add the text of a new text message. And then through Apache,push to readers and authors.

control Mongrel processing upload and download the two logic, it is not difficult to find every logic is "not complete", together to form a complete process. The so-called incomplete work style, reflects the Twitter architecture design of the two "points" concept. One, a complete business process, split into a few pieces of relatively independent work, each work by the same machine in different processes are responsible, even by different machines. Second, the collaboration between multiple machines is refined into the transfer of data and control commands, emphasizing the separation of data flow and control flow.

segmentation of business processes is not the initiative of Twitter. In fact, the three-paragraph structure, the purpose is to split the process. The WEB Server is responsible for HTTP parsing, creator server is responsible for the business logic, and the database is responsible for data storage. By adhering to this tenet, the business logic of creator server can be further segmented.

In 1996, John Ousterhout, a former professor at Berkeley, who invented the Tcl language, made a keynote speech at the USENIX Conference titled "Why in most cases multithreading is a bad design [36]". Eric Brewer and his students, a professor at Berkeley University in 2003, published an article entitled Why event-driven is a bad design for high concurrent servers [37]. These two Berkeley colleagues, Tongshicaoge, what are they arguing about?

is called multithreading, which is simply a thread that is responsible for a complete business process from beginning to end. For example, it's like a garage master who repairs a car. The so-called event-driven, refers to a complete business process, split into several independent work, each work by one or several threads responsible. For example, like an assembly line in a car factory, there are multiple workstations, each of which is held by one or several workers.

Obviously, Twitter's approach is an event-driven faction. The benefit of event-driven is the dynamic invocation of resources. The event-driven architecture can easily mobilize more resources to defuse stress when the workload of a particular task becomes a bottleneck in the process. For a single machine, the difference in performance between multithreaded and event-driven design is not obvious. But for distributed systems, event-driven dominance is more vividly played.

Twitter has split the business process two times. One, the separation of mongrel and MySQL database, mongrel not directly involved in the MySQL database operation, but entrusted memcached solely responsible. Second, the two logic of uploading and downloading is separated, and the control instruction is passed between two logic kestrel queues.

in the debate between John Ousterhout and Eric Brewer Two professors, there is no explicit question of separating data streams from control flows. The so-called event includes both the control signal and the data itself. Considering the large size of the data and the high transmission cost, the control signal size is small and the transmission is simple. Separating the data flow from the control flow can further improve the system efficiency.

in the Twitter system, Kestrel message queues are designed to transmit control signals, the so-called control signals, which are actually IDs. And the data is the text message, stored in the row memcached. Who to deal with this text message, by Kestrel to inform.

Twitter's average time to complete the business process is 500ms, even up to 200-300ms, indicating that the event-driven design is successful in the Twitter distributed system.

Kestrel Message Queuing, which Twitter developed itself. There are many open source implementations for Message Queuing, why does Twitter bother to develop it without off-the-shelf free tools?

Reference,

[+] Why threads are a bad idea (for most purposes), 1996. (http://www.stanford.edu/class/cs240/readings/threads-bad-usenix96.pdf)

[Notoginseng] Why events are a bad idea (for high-concurrency servers), 2003. (http://www.cs.berkeley.edu/~brewer/papers/threads-hotos-2003.pdf)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More