Open-source implementation of running applications in Linux Hadoop

Last Update:2013-12-23 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Linux Hadoop is quite common, So I studied Linux Hadoop and shared it with you here. I hope it will be useful to you. What is Linux Hadoop.

Linux Hadoop is a framework used to run applications on cheap hardware devices in large clusters. Linux Hadoop provides a set of stable and reliable interfaces and data motion transparent to applications. Google's MapReduce algorithm is implemented in Linux Hadoop, which divides applications into many small units of work. Each unit can be executed on any cluster node or repeatedly.

In addition, Linux Hadoop provides a distributed file system for storing data on various computing nodes and provides high throughput for data read/write. Because the map/reduce and distributed file systems are applied to make the Linux Hadoop framework highly fault tolerant, it will automatically process failed nodes. The Linux Hadoop framework has been tested in clusters with 600 nodes.

Google's data center uses a cheap Linux PC to form a cluster and runs various applications on it. Even new users of distributed development can quickly use Google's infrastructure. There are three core components:

1. GFSGoogle File System ).

A Distributed File System that hides details such as lower-layer load balancing and redundant replication, and provides a unified file system API interface for upper-layer programs. Google has made special optimizations based on its own needs, including: Access to ultra-large files, the proportion of read operations far exceeds the write operation, and the PC is prone to failures, resulting in node failure.

GFS divides files into 64 MB blocks, which are distributed on machines in the cluster and stored in the Linux File System. At the same time, each file must have at least three copies of redundancy. The Center is a Master node that searches for file Blocks Based on the file index. For details, see the GFS paper published by Google engineers.

2. MapReduce.

Google found that most distributed operations can be abstracted as MapReduce operations. Map splits Input into Key/Value pairs in the middle, and Reduce combines Key/Value into final Output. The two functions are provided to the system by programmers. The underlying facilities distribute Map and Reduce operations on the cluster and store the results on GFS.

3. BigTable.

A large distributed database is not a relational database. Like its name, it is a huge table used to store structured data.

Open-source implementation

This distributed framework is very creative and highly scalable, making Google highly competitive in terms of system throughput. Therefore, the Apache Foundation uses Java to implement an open-source version that supports Linux platforms such as Fedora. Currently, Linux Hadoop is supported by Yahoo. Some Yahoo employees have been working on projects for a long time, and Yahoo is also preparing to use Linux Hadoop to replace the original FreeBSD-based system.

Linux Hadoop implements the HDFS file system and MapRecue. The current version is 0.16. Not yet mature, but it can be run on 2000 nodes. You only need to inherit MapReduceBase, provide two classes for Map and Reduce respectively, and register a Job to automatically run the distributed operation.

HDFS divides nodes into NameNode and DataNode. NameNode is unique. The program communicates with it and then accesses the file from DataNode. These operations are transparent and are no different from common file system APIs. MapReduce is primarily a JobTracker node, which allocates work and communicates with user programs.

The project is still in progress and has not reached version 1.0. The gap between the project and the Google system is also very large, but the progress is very fast and worth noting. In addition, this is the initial implementation of Cloud Computing and a bridge to the future.

Project homepage: http: // Linux Hadoop.apache.org a distributed system infrastructure developed by the Apache Foundation. You can develop distributed programs without understanding the details of the distributed underlying layer. Make full use of the power of clusters for high-speed computing and storage.

Shenzhen Universiade-Shenzhen 2011 Summer Universiade

The Beta3 version of the Linux Mono Project has been released to learn more RPM here
Linux CVS check whether xinetd is installed in the system
Linux configuration files and user management related system files
Full parsing of Linux File Types
Introduction to Linux swap partition and experiment scenarios and processes

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Open-source implementation of running applications in Linux Hadoop

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Open-source implementation of running applications in Linux Hadoop

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support