Real-time data stream management tools between the system is about to emerge
Source: Internet
Author: User
KeywordsCloud Computing Big Data Twitter Hadoop Storm Wormhole Storm-YARN
According to Gigaom, Facebook and Yahoo! released some details of their live multi-system real-time data flow management tools last week. In this one, Storm-YARN, announced by Yahoo! and open source, is based on YARN (Hadoop 2.0) and Storm, bringing a much tighter set of Storm and Hadoop clusters - borrowing Hadoop batch clusters even from Storm when needed Ability. Wormhole's integrated monitoring system supports many features such as capacity planning, automated repair, automated configuration, and more, but Wormhole is not open source at this time.
The following is the translation:
On June 11, Yahoo! turned to Storm, an internally customized version of Storm, the open source source for the Hadoop cluster. Three days later, on the 14th, Facebook announced the details of the Wormhole system, which is designed to automatically synchronize to other related systems when data in one of the systems changes due to communication among multiple applications, ensuring that data Live Update.
Yahoo!: Storm-YARN
The real-time flow processing framework widely loved by big data analysts, the value is beyond doubt, for example, the success of Twitter proved the value of Storm. Twitter uses Storm to handle tweets, users' Timelines can keep up to date, and Twitter uses Storm to do similar real-time analytics and discover new trends. In fact Twitter by buying Storm founder Backtype did get the double harvest of technology and talent.
Submit and execute the Storm topology
Since its inception in 2011, Storm has been popular with network companies as a stream processing component of Hadoop. Yahoo! now brings Storm and Hadoop closer together, even to the point where Storm can borrow batch node capabilities when needed. This is a very valuable feature - and at a presentation at the Facebook Analytics @ Web Scale meeting last week, Twitter engineer Krishna Gade also lamented the limitations of the Storm auto-extension.
Post Storm Cluster and Hadoop YARN
The Storm-ARN implementation also benefits from a key feature of YARN and a major update to Hadoop 2.0 - allowing Hadoop to run multiple processing frames simultaneously. Twitter had used the open source explorer Mesos to do the same thing, but Gade's colleague Dmitriy Ryaboy had stated that when Hadoop updated to version 2.0, the company would move big data operations to YARN and put more community effort on It's continuous improvement, but also for building more applications.
Facebook: Wormhole
Unfortunately, Facebook's Wormhole has not been open-source so far, but its experience is still worth learning (and LinkedIn already sources similar technologies such as Kafka and Databus). Wormhole belongs to the publish-subscribe system. On Facebook, Wormhole sends new content to Facebook's main user database for graph search to get results as quickly as possible, and Wormhole can also send data to its Hadoop environment for analysis jobs The data is up to date.
Like Facebook's previous work (such as the new interactive query engine Presto), Wormhole has very good scalability. Laurent Demailly tweeted that the delay is fully controlled at the millisecond level:
Wormhole handles more than 1 trillion messages per day, more than 10 million messages per second. Wormhole is designed to handle the failure of individual components with several features: integrated monitoring system, automatic repair, capacity planning support, automated configuration and mutation handling support.
At last
While Storm-YARN and Wormhole were developed by different companies, it is clear that they will set off a hurricane in the Hadoop and Storm space. With the expansion of the network company's business, the application has also been extended to a hybrid application and service type, so a campaign for infrastructure is in full swing. Based on the different needs of the data-tier system, these companies have to relinquish their original architecture and move to building Storm-like and Wormhole-based systems to manage the flow of data between different systems.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.