Hadoop Learning Series Note one: Building a Hadoop source reading environment

Source: Internet
Author: User

This article is derived from the deep analysis of Hadoop Technology Insider design and implementation principles of Hadoop common and HDFs architecture

First, the basic concept of Hadoop

    • Hadoop is an open source distributed computing platform under the Apache Foundation, with the core of the Hadoop Distributed File System (HDFS) and the MapReduce Distributed computing framework, providing users with a transparent, distributed infrastructure for the underlying details.
    • HDFs's high fault tolerance, high scalability, and so on, allows users to deploy Hadoop on inexpensive hardware and build distributed systems.
    • The MapReduce distributed computing framework allows users to develop parallel, distributed applications without understanding the underlying details of distributed systems, allowing large-scale computing resources to be used to solve big data processing problems that cannot be solved by traditional high performance single machines.

Second, the advantages of Hadoop

    • Convenience: Hadoop can be run on a large cluster of commercial machines, or on cloud computing services
    • Resiliency: Hadoop can scale linearly to handle larger datasets by increasing cluster nodes. At the same time, when the cluster load drops, nodes can also be reduced to efficiently use computing resources
    • Robust: can gracefully handle hardware failures on a common computing platform
    • Simple: Hadoop allows users to quickly write efficient, parallel distributed code

Third, the biosphere of Hadoop

    • Hadoop Common: Provides some common tools for other Hadoop projects, including system configuration tools, remote procedure call RPC, serialization mechanism, and Hadoop abstract file system filesystem. They provide basic services for building a cloud computing environment on common hardware and provide the required APIs for software development running on the platform
    • Avro is a data serialization system. Similar to other serialization mechanisms, Avro can convert data structures or objects into a format that is easy to store and transfer, and is designed to support data-intensive applications for storage and exchange of large-scale data. Avro provides a rich data structure type, fast compressible binary data format, file set for storing persistent data, remote call RPC and simple dynamic language integration, and other functions

Hadoop Learning Series Note one: Building a Hadoop source reading environment

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.