Hadoop is an open source distributed computing platform owned by the Apache Software Foundation. Hadoop, the core of Hadoop Distributed File System (Hadoop distributed Files System,hdfs) and (open source implementations of Google MapReduce), provides users with a distributed infrastructure that is transparent to the underlying details of the system. HDFs's high fault tolerance, high scalability and other advantages allow users to deploy Hadoop on low-cost hardware to form a distributed system; the MapReduce distributed programming model allows users to concurrently parallel applications without knowing the underlying details of the distributed system. So users can easily organize the computer resources by using Hadoop in the cluster, build their own distributed computing platform, and make full use of the computing and storage capacity of the cluster to complete the processing of massive data.
Hadoop is a 一个开源框架
distributed application that can be written and run 处理大规模数据
. Distributed computing is a broad and ever-changing field.
The advantage of Hadoop is that:
1) convenience : Hadoop runs on a large cluster of general commercial machines, or on cloud computing services, such as EC2.
2) Robust : Hadoop is committed to running on general commodity hardware, and its architecture assumes that hardware is frequently invalidated, and Hadoop can handle most of these failures in a leisurely manner.
3) extensible: Hadoop can scale linearly to handle larger datasets by increasing cluster nodes.
4) Simple : Hadoop allows users to quickly write efficient parallel code.
Of the Hadoop framework 核心是HDFS和MapReduce
. Where HDFS is a distributed file system, MapReduce is a distributed data processing model and execution environment. Mastered these two parts, also mastered the core of Hadoop things,
Hadoop was born out of Doug Cutting's ongoing project--nutch based on Google's GFs and mapreduce ideas, and is now attributed to Apache.
What is Hadoop?