1. Introduction to hadoop1.1.0
- Hadoop is a distributed storage and computing platform suitable for big data.
- Hadoop core consists of HDFS and mapreduce
- HDFS is a master-slave structure with only one master node and namenode: There are many slave nodes
- Distributed File System and HDFS (HDFS architecture and basic concepts)
- Distributed File System
- As the data volume increases, it cannot be stored within the jurisdiction of an operating system, so it is allocated to more disks managed by the operating system, but it is not convenient to manage and maintain, therefore, a system is urgently needed to manage files on multiple machines, which is a distributed file management system.
- Is a file system that allows files to be shared on multiple hosts over the network. It allows multiple users on multiple hosts to share files and buckets.
- Permeability. This allows you to access files through the network. In the view of programs and users, it is like accessing a local disk.
- Fault Tolerance. Even if some nodes in the system are offline, the system can continue to operate without data loss.
- There are many distributed file management systems, and HDFS is only one of them. This method is applicable when multiple queries are written at a time. Concurrent writes are not supported, and small files are not suitable.
Ii. HDFS shell operations
- The bin/hadoop FS form should be used to call the File System (FS) Shell Command.
- All FS shell commands use the URI path as the parameter.
- The URI format is scheme: // authority/path. The scheme of HDFS is HDFS. For the local file system, scheme is file. The scheme and authority parameters are optional. If not specified, the default scheme specified in the configuration will be used.
- For example,/parent/child can be expressed as HDFS: // namenode: namenodeport/parent/child, or simpler/parent/child (assuming the configuration file is namenode: namenodeport)
- The behavior of most FS shell commands is similar to that of the corresponding Unix shell commands.
Iii. Common HDFS commands
-Help [cmd] // display Command help information
-Ls (r) <path> // display all files in the current directory
-Du (s) <path> // display the size of all files in the directory
-Count [-q] <path> // display the number of files in the directory
-MV <SRC> <DST> // move multiple files to the target directory
-CP <SRC> <DST> // copy multiple files to the target directory.
-RM (r) // delete a file (folder)
-Put <localsrc> <DST> // copy a local file to HDFS
-Copyfromlocal // same as put
-Movefromlocal // move from a local file to HDFS
-Get [-ignorecrc] <SRC> <localdst> // copy the file to a local device to ignore the CRC check.
-Getmerge <SRC> <localdst> // sorts all files in the source directory and merges them into one file.
-Cat <SRC> // display the file content on the terminal
-Text <SRC> // display the file content on the terminal
-Copytolocal [-ignorecrc] <SRC> <localdst> // copy to local
-Movetolocal <SRC> <localdst>
-Mkdir <path> // create a folder
-Touchz <path> // creates an empty file.
Shell exercises for HDFS
Hadoop FS-ls/view HDFS root directory
Hadoop FS-mkdir/test create a directory named test in the root directory
Hadoop FS-mkdir/test1 create a directory test1 in the root directory