hadoop--the primary realization of cloud computing and the bridge to the future

Source: Internet
Author: User
Keywords Realization bridge
Project home: http://hadoop.apache.org


a distributed system infrastructure, developed by the Apache Foundation. Users can develop distributed programs without understanding distributed low-level details. Take full advantage of the power of cluster high speed operation and storage.

Origin: Google's cluster system

Google's data center uses inexpensive Linux PCs to form clusters that run a variety of applications. Even novice developers of distributed development can quickly use Google's infrastructure. The core components are 3:

1. GFS (Google File System). A distributed file system, hidden load balancing, redundant replication and other details, to the upper layer of the program to provide a unified file system API interface. Google in accordance with its own needs for its special optimization, including: Large file access, read the operating ratio far more than the write operation, the PC is very easy to fail to cause node failure. GFS divides files into 64MB chunks, distributed across clusters of machines, and uses Linux file systems for storage. At the same time each file has at least 3 copies of redundancy. The center is a master node that searches for blocks of files based on the file index. See the GFS paper released by Google's engineers.

2, MapReduce. Google finds that most distributed operations can be abstracted as mapreduce operations. The map is to decompose input inputs into intermediate key/value pairs, and reduce key/value the resultant output. These two functions are provided by the programmer to the system, and the underlying infrastructure distributes the map and reduce operations on the cluster and stores the results on GFS.

3, BigTable. A large distributed database, which is not a relational database. Like its name, is a huge table for storing structured data.

All of the above three facilities are published in Google papers.

Open Source Implementation

This distributed framework is creative and highly scalable, making Google competitive in system throughput. So the Apache Foundation uses Java to implement an open source version that supports Fedora and other Linux platforms. Hadoop is currently supported by Yahoo, which has a long-term job on the project, and is ready to use Hadoop instead of the original FreeBSD system.
Hadoop implements the HDFs file system and Maprecue. The current version is 0.16. is immature, but it can already be run on 2000 nodes. As long as the user inherits Mapreducebase, it provides two classes for implementing map and reduce separately, and registers the job to automate distributed operation.

HDFs divides the nodes into two categories: Namenode and Datanode. Namenode is unique, the program communicates with it, and then accesses the file from the Datanode. These operations are transparent and do not differ from normal file system APIs.
MapReduce is the Jobtracker node, assigning work and communicating with user programs.

Future

The project is still in progress, has not reached the 1.0 version, and the Google system is very wide gap, but the progress is very fast, worthy of attention.

In addition, this is the initial phase of cloud computing (Cloud Computing), a bridge to the future.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.