This article is derived from the deep analysis of Hadoop Technology Insider design and implementation principles of Hadoop common and HDFs architecture
First, the basic concept of Hadoop
- Hadoop is an open source distributed computing platform under the Apache Foundation, with the core of the Hadoop Distributed File System (HDFS) and the MapReduce Distributed computing framework, providing users with a transparent, distributed infrastructure for the underlying details.
- HDFs's high fault tolerance, high scalability, and so on, allows users to deploy Hadoop on inexpensive hardware and build distributed systems.
- The MapReduce distributed computing framework allows users to develop parallel, distributed applications without understanding the underlying details of distributed systems, allowing large-scale computing resources to be used to solve big data processing problems that cannot be solved by traditional high performance single machines.
Second, the advantages of Hadoop
- Convenience: Hadoop can be run on a large cluster of commercial machines, or on cloud computing services
- Resiliency: Hadoop can scale linearly to handle larger datasets by increasing cluster nodes. At the same time, when the cluster load drops, nodes can also be reduced to efficiently use computing resources
- Robust: can gracefully handle hardware failures on a common computing platform
- Simple: Hadoop allows users to quickly write efficient, parallel distributed code
Third, the biosphere of Hadoop
- Hadoop Common: Provides some common tools for other Hadoop projects, including system configuration tools, remote procedure call RPC, serialization mechanism, and Hadoop abstract file system filesystem. They provide basic services for building a cloud computing environment on common hardware and provide the required APIs for software development running on the platform
- Avro is a data serialization system. Similar to other serialization mechanisms, Avro can convert data structures or objects into a format that is easy to store and transfer, and is designed to support data-intensive applications for storage and exchange of large-scale data. Avro provides a rich data structure type, fast compressible binary data format, file set for storing persistent data, remote call RPC and simple dynamic language integration, and other functions
Hadoop Learning Series Note one: Building a Hadoop source reading environment