The Hadoop tide is gradually sweeping across all of America's vertical industries, including finance, media, retailing, energy, and pharmaceuticals. Hadoop, while building up the concept of large data, also carries out real-time analysis of massive data, and finds the trend from the analysis to improve the profitability of the enterprise.
As open source data management software, Apache Hadoop is primarily used to analyze a large number of structured and unstructured data in a distributed environment. Hadoop has been used in many popular websites, including Yahoo,facebook,linkedin and ebay.
Facebook engineers believe they run the largest data collection platform based on Hadoop. But the data analysis platform itself has an unavoidable weakness, and Facebook's engineers have come up with a way to coordinate all the work with a single server.
Andrew Ryan, a Facebook engineer, expressed his views on the issue in Hadoop Summit. Facebook now collects data on the world's largest HDFs (Hadoop distributed File System), with a total of more than 100PB of valuable data distributed across 100 clusters in different data centers.
With the current large-scale data analysis becoming popular, the single node fault of Hadoop becomes the target. The Hadoop deployment spans hundreds of or even thousands of servers, and the server node responsible for the entire dispatch is known as Namenode,namenode on the HDFs, and Namenode is responsible for directing the Datanode from the end to perform the underlying I/O tasks.
Namenode trace files are split into file blocks, which are stored by which nodes, and whether the distributed file system is functioning as a whole. But if a single node stops running, it will cause the data node to fail to communicate, which in fact will cause the entire system to stop working.
Facebook predicts that if this flaw is addressed, it will halve the downtime of the Data Warehouse. To address the problem, Facebook engineers have developed software called Avatarnode that can switch to backup namenode when a failure occurs. After setup, each data node is periodically sent to the primary node and the backup node.
At the same time, Facebook offers open source Avatarnode to provide Hadoop administrators with the benefits of practical work. Facebook has been running Avatarnode internally, which has greatly improved the reliability of the HDFs cluster. Facebook is currently working to further improve avatarnode and integration with highly available frameworks, allowing for unattended, automated, and secure failover capabilities.
Of course, Facebook is not just trying to solve the pitfalls of Hadoop, but MapR and Cloudera products have the same capabilities as standby.
Original link: pc