Introduction to Hadoop:
- Distributed, extensible, reliable, distributed computing framework.
Component:
Common: Common components
HDFS: Distributed File System
Yarn: Operating Environment
MAPREDUCE:MR Calculation model
Eco-System:
Ambari: operator interface
Avro: Universal serialization mechanism, language-independent
Cassandra: Database
Chukwa: Data collection system
HBase: Distributed Large Table Database
Hive: SQL-based analysis system
Matout: Machine Learning Algorithm Library
Pig: scripting language
Spark: A fast and versatile computing engine for iterative computing
Tez: Data Flow Framework
Zookeeper: high-performance coordination services
Massive data analysis:
- The original way? Space Limit | performance Limit | single node Failure | Detail implementation issues
- HDFs? Provide Unified Interface | Large file segmentation | Distributed Storage | parallel Expansion | high reliability
HDFS
Hadoop ecosystem Distributed File system to solve big data storage problems.
HDFs is the file system abstracted on the local file system, providing a unified access interface (directory tree), the actual file after slicing and load balancing algorithm, stored in the local file system, through a master node (Namenode) unified management.
To improve the reliability of data storage, blocks of files are stored in multiple copies (default 3) The first one is on this machine, the second one is on the same rack in the native location, and the third one is on a different rack.
File system: Provides a unified set of access interfaces that mask the underlying implementation details of the system.
Hadoop directory structure:
Bin: Executable Script
ETC: System Configuration
LIB: local library
Sbin: Executable script for the system
Share: Shared directory, stored jar package
HDFs file Operation:
- Operation with HDFs DFS command
- Put: Uploading Files
- Get: Download file
- LS: Display file
- Cat: Display file contents
- Tail: View end of File
- Count: Number of statistics files
- Copy of Cp:hdfs
- DF: View disk capacity
- Du: Viewing File size
- mkdir: Create Folder-P Create parent folder
- RM: Delete
- MV: Mobile
- CreateSnapshot: Creating a Snapshot
- Chown: Modify Owner
- CHOMD: Modify Permissions
HDFs file Storage
- Files stored in the tmp/data/sub-folder, large files will be cut into 128M size block, the file is simply segmented, do not do anything, can be manually stitched into the full file.
Big Data Learning note 1--hadoop Introduction and Getting Started