Many people may not know the answer to this question about hadoop. This is just like introducing hadoop. I hope you will have a clear understanding of hadoop through this article.
AD: 51cto cloud computing architect Summit is in progress!
This section describes the concept and features of hadoop. You are welcome to learn about hadoop.
1. What is hadoop?
Hadoop was originally a sub-project under apachelucene. It was originally a project dedicated to distributed storage and distributed computing separated from the nutch project. To put it simply, hadoop is a software platform that is easier to develop and run to process large-scale data.
2. The following lists the main features of hadoop:
Scalable: reliable storage and processing of gigabit (PB) data.
2. Low Cost: data can be distributed and processed by a server group composed of common machines. These Server clusters can have up to thousands of nodes.
3 efficiency: by distributing data, hadoop can process the data in parallel on the node where the data is located, which makes processing very fast.
4 Reliability: hadoop can automatically maintain multiple copies of data and automatically redeploy (Redeploy) computing tasks after a task fails.
3. hadoop implements a Distributed File System (HDFS.
HDFS features high fault tolerent and is designed to be deployed on low-cost (low-cost) hardware. It also provides a high transfer rate (highthroughput) to access application data, suitable for applications with large datasets (largedataset. HDFS relaxed (relax) POSIX requirements (requirements) so that you can access the data in the streamingaccess file system in the form of a stream.
4. hadoop also implements the mapreduce distributed computing model.
Mapreduce splits the work of applications into many small pieces of work (smallblocksofwork ). HDFS creates multiple copies of data blocks (datablocks) for reliability and places them in the compute node (computenodes) of the server group ), mapreduce can process the data on the node where they are located.
As shown in:
5. hadoopapi is divided into the following main packages)
Org. Apache. hadoop. conf defines the configuration file processing API for system parameters.
Org. Apache. hadoop. FS defines the abstract file system API.
Implementation of the org. Apache. hadoop. dfshadoop Distributed File System (HDFS) module.
Org. Apache. hadoop. Io defines general I/oapis for reading and writing data objects such as networks, databases, and files.
Org. Apache. hadoop. IPC is a tool used for network servers and clients. It encapsulates basic modules of Asynchronous Network I/O.
Implementation of the org. Apache. hadoop. mapredhadoop Distributed Computing System (mapreduce) module, including task distribution and scheduling.
Org. Apache. hadoop. Metrics defines an API for Performance Statistics, mainly used for mapred and DFS modules.
Org. Apache. hadoop. Record defines the I/OAPI class for records and a record description language translator, which is used to simplify serialization of records into language-neutral formats (Language-neutralmanner ).
Org. Apache. hadoop. Tools defines some common tools.
Org. Apache. hadoop. util defines some public APIs.