Big Data Learning note 1--hadoop Introduction and Getting Started

Source: Internet
Author: User
Tags hdfs dfs hadoop ecosystem

Introduction to Hadoop:

    • Distributed, extensible, reliable, distributed computing framework.

Component:

    • Common: Common components

    • HDFS: Distributed File System

    • Yarn: Operating Environment

    • MAPREDUCE:MR Calculation model

Eco-System:

    • Ambari: operator interface

    • Avro: Universal serialization mechanism, language-independent

    • Cassandra: Database

    • Chukwa: Data collection system

    • HBase: Distributed Large Table Database

    • Hive: SQL-based analysis system

    • Matout: Machine Learning Algorithm Library

    • Pig: scripting language

    • Spark: A fast and versatile computing engine for iterative computing

    • Tez: Data Flow Framework

    • Zookeeper: high-performance coordination services

Massive data analysis:

    • The original way? Space Limit | performance Limit | single node Failure | Detail implementation issues
    • HDFs? Provide Unified Interface | Large file segmentation | Distributed Storage | parallel Expansion | high reliability

HDFS

    • Hadoop ecosystem Distributed File system to solve big data storage problems.

    • HDFs is the file system abstracted on the local file system, providing a unified access interface (directory tree), the actual file after slicing and load balancing algorithm, stored in the local file system, through a master node (Namenode) unified management.

    • To improve the reliability of data storage, blocks of files are stored in multiple copies (default 3) The first one is on this machine, the second one is on the same rack in the native location, and the third one is on a different rack.

    • File system: Provides a unified set of access interfaces that mask the underlying implementation details of the system.

Hadoop directory structure:

    • Bin: Executable Script

    • ETC: System Configuration

    • LIB: local library

    • Sbin: Executable script for the system

    • Share: Shared directory, stored jar package

HDFs file Operation:

    • Operation with HDFs DFS command
    • Put: Uploading Files
    • Get: Download file
    • LS: Display file
    • Cat: Display file contents
    • Tail: View end of File
    • Count: Number of statistics files
    • Copy of Cp:hdfs
    • DF: View disk capacity
    • Du: Viewing File size
    • mkdir: Create Folder-P Create parent folder
    • RM: Delete
    • MV: Mobile
    • CreateSnapshot: Creating a Snapshot
    • Chown: Modify Owner
    • CHOMD: Modify Permissions

HDFs file Storage

    • Files stored in the tmp/data/sub-folder, large files will be cut into 128M size block, the file is simply segmented, do not do anything, can be manually stitched into the full file.

Big Data Learning note 1--hadoop Introduction and Getting Started

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.