Real-time data stream management tools between the system is about to emerge

Source: Internet
Author: User
Keywords Cloud Computing Big Data Twitter Hadoop Storm Wormhole Storm-YARN

According to Gigaom, Facebook and Yahoo! released some details of their live multi-system real-time data flow management tools last week. In this one, Storm-YARN, announced by Yahoo! and open source, is based on YARN (Hadoop 2.0) and Storm, bringing a much tighter set of Storm and Hadoop clusters - borrowing Hadoop batch clusters even from Storm when needed Ability. Wormhole's integrated monitoring system supports many features such as capacity planning, automated repair, automated configuration, and more, but Wormhole is not open source at this time.

The following is the translation:

On June 11, Yahoo! turned to Storm, an internally customized version of Storm, the open source source for the Hadoop cluster. Three days later, on the 14th, Facebook announced the details of the Wormhole system, which is designed to automatically synchronize to other related systems when data in one of the systems changes due to communication among multiple applications, ensuring that data Live Update.

Yahoo!: Storm-YARN

The real-time flow processing framework widely loved by big data analysts, the value is beyond doubt, for example, the success of Twitter proved the value of Storm. Twitter uses Storm to handle tweets, users' Timelines can keep up to date, and Twitter uses Storm to do similar real-time analytics and discover new trends. In fact Twitter by buying Storm founder Backtype did get the double harvest of technology and talent.

Submit and execute the Storm topology

Since its inception in 2011, Storm has been popular with network companies as a stream processing component of Hadoop. Yahoo! now brings Storm and Hadoop closer together, even to the point where Storm can borrow batch node capabilities when needed. This is a very valuable feature - and at a presentation at the Facebook Analytics @ Web Scale meeting last week, Twitter engineer Krishna Gade also lamented the limitations of the Storm auto-extension.

Post Storm Cluster and Hadoop YARN

The Storm-ARN implementation also benefits from a key feature of YARN and a major update to Hadoop 2.0 - allowing Hadoop to run multiple processing frames simultaneously. Twitter had used the open source explorer Mesos to do the same thing, but Gade's colleague Dmitriy Ryaboy had stated that when Hadoop updated to version 2.0, the company would move big data operations to YARN and put more community effort on It's continuous improvement, but also for building more applications.

Facebook: Wormhole

Unfortunately, Facebook's Wormhole has not been open-source so far, but its experience is still worth learning (and LinkedIn already sources similar technologies such as Kafka and Databus). Wormhole belongs to the publish-subscribe system. On Facebook, Wormhole sends new content to Facebook's main user database for graph search to get results as quickly as possible, and Wormhole can also send data to its Hadoop environment for analysis jobs The data is up to date.

Like Facebook's previous work (such as the new interactive query engine Presto), Wormhole has very good scalability. Laurent Demailly tweeted that the delay is fully controlled at the millisecond level:

Wormhole handles more than 1 trillion messages per day, more than 10 million messages per second. Wormhole is designed to handle the failure of individual components with several features: integrated monitoring system, automatic repair, capacity planning support, automated configuration and mutation handling support.

At last

While Storm-YARN and Wormhole were developed by different companies, it is clear that they will set off a hurricane in the Hadoop and Storm space. With the expansion of the network company's business, the application has also been extended to a hybrid application and service type, so a campaign for infrastructure is in full swing. Based on the different needs of the data-tier system, these companies have to relinquish their original architecture and move to building Storm-like and Wormhole-based systems to manage the flow of data between different systems.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.