Twitter launched an Open-source system designed to reduce the trade-off between batch processing and streaming, by combining the two into a hybrid system. Twitter uses Hadoop to do batch processing, using http://www.aliyun.com/zixun/aggregation/13431.html ">storm" to do streaming, mixed systems called Summingbird. Summingbird can't do everything, but it's very convenient to accomplish the task.
Twitter Blog issued a Summingbird, the details of the technical composition is very large. But if you think about how Twitter works, the nature of the problem is easy to understand. Services such as trending topics (topic trends) and search for a class need real-time data processing to be useful. But in the end there is a need for accuracy and perhaps some depth analysis. Storm is a bit like a hospital treatment division, while Hadoop is a long-term patient care.
The following is a description of the Summingbird project wiki that explains how Summingbird works at a high level, and the implementation is certainly a little more complicated:
Summingbird's hybrid mode allows Hadoop to handle most of the data and can be serviced from a Manhattan type of read-only storage. Storm only deals with data that Hadoop has not processed, and those that are limited to the delay window. These live data comes from the data store. There is a limit to the error in the real-time layer, because Hadoop will eventually process the same data, thus smoothing out the errors introduced.
Such hybrid systems are actually getting more and more common, and many companies realize that they have no way to live in the real world with Hadoop alone. We've covered a lot of companies--including Gravity,linkedin and netflix----and they've done something like that. Summingbird may be a bit different because the data that Summingbird to process comes from both Hadoop and Storm, and it's not the same as a pipe-handling system. But internet companies need ways to make sure they don't use speed for accuracy or vice versa.
We didn't find Twitter to specifically discuss Summingbird, but our data lineup is compelling and may be a great way to explain why this thing is important. They come from PayPal, MailChimp and LinkedIn, as well as entrepreneurs from places like Yahoo and the NSA.
Talk a little bit more summingbird. Twitter actually describes it as "streaming MapReduce," because Summingbird focuses on job aggregation. You can look at a June speech delivered by Sam Ritchie, a Twitter online. Yahoo's Open source project Storm-yarn is actually running Storm within the Hadoop cluster, allowing Storm access to data storage based on Hadoop, and worth looking at.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.