Nine snakes and elephants fight, Hydra or will replace Hadoop

Source: Internet
Author: User
Keywords Can we elephants or will
Tags analysis based big data community cutting data data analysis platform data processing

"Editor's note" Hadoop is considered the best large data analysis platform, itself has good performance, as well as active open source community support, Hadoop founder Doug Cutting also predicted that future Hadoop is not only for large data processing, but also will become the system core of the data platform, will be used for online transaction processing ... Hadoop's development prospects seem bright, but did not notice the emergence of competitors, Hydra in some ways even more than Hadoop superior performance, announced Open source, Hydra received more and more support, the future Hydra very likely to become a strong competitor Hadoop, Alex Woodie, the editor-in-chief of Datanami, brought us a detailed analysis.

The following is the translation:

Hydra (nine-head snake), a distributed task-handling system, developed six years ago by the social label service provider AddThis, now has the open source license of Apache, just like Hadoop, but without the popularity and momentum of Hadoop. Hydra's creators say the "Bulls" platform is very good at dealing with large data tasks-dealing with very large datasets in real time, a task that might make the elephant (Hadoop) a headache.

Hadoop is still a great platform for storing large amounts of data, but many companies face another problem, and the value of how we analyze data after we store it in Hadoop, whether it's hive or pig need easy access to data in Hadoop. As we can see, Hadoop is not appropriate for real-time analysis.

Hydra is a large data storage and processing platform, developed by Matt Abrams and his AddThis colleagues. AddThis, the clearspring of the past, is a company that develops widgets for Web server widgets, allowing visitors to easily share their data through Twitter, Facebook, pintrest, Google + or Instagram.

As AddThis began to expand its business, it gradually felt powerless towards more and more user data. The company needs an extensible distributed system for real-time analysis of the data shared by its users. Hadoop was unable to meet AddThis's needs at the time, so it developed Hydra.

So, what exactly is Hydra? In short, it is a distributed task-processing system that can support both streaming and batching. It uses a tree based data structure to store and process data with thousands of node clusters. It has a linux-based file system that makes it compatible with ext3, EXT4, or even ZFS; it also has a job/cluster management component that automatically assigns new jobs and balances existing jobs to the cluster, and it can automatically back up data and handle node failures automatically.

Hydra includes many components: a distributed job execution system that spans heterogeneous cluster processing tasks, a network-accessible file service system, and local and remote backups (taking into account difficult node failures).

Based on the tree structure, it can process stream data and batch jobs at the same time. Chris Burroughs, a member of the AddThis Engineering department, first announced Hydra Open source in his January 23 blog and provided a brilliant description of Hydra: "It takes streaming data (such as log files) and generates aggregation trees, summary trees, or data transformation trees, These trees can be used to explore (small queries) as part of machine learning (large queries), or to support real-time consoles (large numbers of queries) on a Web site. ”

Hydra was originally designed to help addthis solve their own problems, for internal use, and to provide services to website operators. Typical questions include: "How many users visited the site last month?" "and" How much is the site accessed from different countries and browsers? ”

AddThis continues to use Hydra to deal with its massive data traffic, analyzing the development trends of its customers ' web sites. AddThis can understand what people share online and what topics are popular. The social label service is used by more than 13 million websites, one months with 1.3 billion users, with an average of 3 billion traffic per day producing 10TB of data, and now Hydra is running on thousands of network nodes in AddThis.

"We've been working on large datasets for a long time, Hydra have been very useful to us, and we feel that it solves the problem of distributed data processing in a unique way," Abrams told Datanami via email. ”

Traditional Hadoop is oriented to batch processing, while Hydra can support both batch processing and real-time streaming. Abrams said: "Hydra support batch processing mainly focus on flow analysis and incremental data processing, can use tree data structure description data, compression and efficient query and access to natural data." Hydra can produce and accept data from HDFs, but it completes operations on the native file system, allowing it to flexibly use other services on the Hydra. ”

Now that Hydra is open source, Abrams hopes that the software will be used more widely and be better developed. "It will take some time, but we believe that in the future we will build a perfect Hydra open source community so that both AddThis and OS (open source) communities can benefit from Hydra future development." There are already some other companies using Hydra in Washington, D.C., and we look forward to further development of the Hydra community. ”

In the autumn of 2013, Doug Cutting,hadoop's founder, Cloudera's chief architect, lamented that Hadoop lacked a substitute-and cutting said: "How much I expect more systems like Hadoop to appear ..." Although Hadoop now dominates the big data world, who can say that it will be the only big data distributed computing platform? I believe that the future development of Hydra will not disappoint him, for the future Hydra development, I would like to cite another sentence of cutting: "The sky is the limit." ”

SOURCE Link: Hadoop alternative Hydra re-spawns as Open Source (compilation/Mao Mengqi revisers/wei)

(Responsible editor: The good of the Legacy)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.