Facebook solves the Achilles heel of Hadoop

Source: Internet
Author: User
Keywords Resolve Achilles heel name
Tags analysis apache data data management distributed distributed file system enterprise environment

The Hadoop tide is gradually sweeping across all of America's vertical industries, including finance, media, retailing, energy, and pharmaceuticals. Hadoop, while building up the concept of large data, also carries out real-time analysis of massive data, and finds the trend from the analysis to improve the profitability of the enterprise.

As open source data management software, Apache Hadoop is primarily used to analyze a large number of structured and unstructured data in a distributed environment. Hadoop has been used in many popular websites, including Yahoo,facebook,linkedin and ebay.

Facebook engineers believe they run the largest data collection platform based on Hadoop. But the data analysis platform itself has an unavoidable weakness, and Facebook's engineers have come up with a way to coordinate all the work with a single server.

Andrew Ryan, a Facebook engineer, expressed his views on the issue in Hadoop Summit. Facebook now collects data on the world's largest HDFs (Hadoop distributed File System), with a total of more than 100PB of valuable data distributed across 100 clusters in different data centers.

With the current large-scale data analysis becoming popular, the single node fault of Hadoop becomes the target. The Hadoop deployment spans hundreds of or even thousands of servers, and the server node responsible for the entire dispatch is known as Namenode,namenode on the HDFs, and Namenode is responsible for directing the Datanode from the end to perform the underlying I/O tasks.

Namenode trace files are split into file blocks, which are stored by which nodes, and whether the distributed file system is functioning as a whole. But if a single node stops running, it will cause the data node to fail to communicate, which in fact will cause the entire system to stop working.

Facebook predicts that if this flaw is addressed, it will halve the downtime of the Data Warehouse. To address the problem, Facebook engineers have developed software called Avatarnode that can switch to backup namenode when a failure occurs. After setup, each data node is periodically sent to the primary node and the backup node.

At the same time, Facebook offers open source Avatarnode to provide Hadoop administrators with the benefits of practical work. Facebook has been running Avatarnode internally, which has greatly improved the reliability of the HDFs cluster. Facebook is currently working to further improve avatarnode and integration with highly available frameworks, allowing for unattended, automated, and secure failover capabilities.

Of course, Facebook is not just trying to solve the pitfalls of Hadoop, but MapR and Cloudera products have the same capabilities as standby.

Original link: pc

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.