What is 1.Hadoop?
Hadoop is a distributed system infrastructure developed by the Apache Foundation. A distributed storage and computing platform for big data allows users to develop distributed programs without knowing the underlying details of the distribution. Take advantage of the power of the cluster to perform high-speed operations and storage. Hadoop implements a distributed filesystem (Hadoop Distributed File System), referred to as HDFs. HDFs is characterized by high fault tolerance and is designed to be deployed on inexpensive (low-cost) hardware, and it provides high throughput (hi throughput) to access application data for applications with very large datasets (large data set). HDFs relaxes the requirements of (relax) POSIX and can access data in a stream (streaming access) file system.
The birth of the 2.Hadoop
Author Doug Cutting inspired by three papers from Google
3.Hadoop Core Projects
- Hdfs:hadoop Distributed File System distributed filesystem
- MapReduce: Parallel Computing Framework
Features of 5.Hadoop
High reliability. The ability of Hadoop to store and process data in bits is worth the trust of
high scalability. Hadoop is the allocation of data between available computer sets and the completion of computational tasks, which can be easily extended to thousands of nodes for
efficiency. Hadoop can dynamically move data between nodes and ensure the dynamic balance of each node, so processing speed is very fast and
fault-tolerant. Hadoop is able to automatically save multiple copies of data and automatically reallocate failed tasks
to
low costs. Hadoop is open source compared to data marts such as all-in-one, commercial data warehouses, and Qlikview, Yonghong Z-suite, and software costs for projects can be significantly reduced
Physical distribution of 6.Hadoop clusters
7. Single-node physical structure
A brief introduction to Hadoop