Getting Started with HDFs (1)

Source: Internet
Author: User
Tags hdfs dfs

2015.07.12 notes

1.HDFS

Distributed File System (human-computer interaction, the most important function is document management, the use of File management system, Windows, Linux file management system has commonalities: Users can create files/folders, delete, modify permissions, modify the source data information (create, modify, Access time, etc.). The management of the files or data in the operating system is realized through the file system, which is an important part of the operating system to implement file management, and the operating system will put the files into the hardware (hard disk, the server can have more than 10,000 megabytes) to manage. In the server, the storage management of the data is managed by the program we write, when the amount of data is too long, you can increase the hard disk (6 slots) in the hard disk slot, can be dispersed to multiple systems, users do not need to know which IP the file is stored on (such as www.Baidu.com domain name to multiple servers) DFS Distributed File system is data distributed in many file systems, and DFS corresponds to an operating system () the amount of data, user operation is not convenient, DFS Distributed file system overrides in the operating system of the file management system )

The volume of data is increasing, the scope of an operating system is not enough, then allocated to more operating system management of the disk, but not easy to manage and maintain, so there is an urgent need for a system to manage the files on multiple machines , which is Distributed file Management system .

is a file system that allows files to be shared across multiple hosts on a network, allowing multiple users on multiple machines to share files and storage space.

Permeability. Let's actually access the file through the network action, by the program and the user, it is like accessing the local disk generally.

Fault tolerant. Even if some nodes are offline in the system, the system can continue to operate without any data loss as a whole.

A lot of distributed file management systems, HDFs is just one of them, inappropriate small files (through a certain strategy to make small files into large files). Implementing File Management

?

HDFs's shell (HDFs stores big data, Shell is part of the Linux operating system, HDFs is part of Hadoop software, commands in the HDFs interface are invoked in the shell using specific commands) (LS Blue font is a folder, green is a file)

The call file system (FS) shell command should use the form Bin/hdfs dfs xxx.

All FS shell commands use the URI path as the parameter.

The URI format is Scheme://authority/path. The scheme for HDFs is HDFs, the scheme is file for the local filesystem. The scheme and authority parameters are optional, and if not specified, the default scheme specified in the configuration is used.

For example:/parent/child can be represented as hdfs://namenode:namenodeport/parent/child, or simpler/parent/child (assuming the configuration file is Namenode: Namenodeport)

The behavior of most FS shell commands is similar to the corresponding UNIX shell command.

Install (green file is for batch start Hadoop write file (write))

2.apache Hadoop directory structure

(View script file) (in the Java threading concept of the method set daemon give it a true to let it become a daemon process, the wizard process keeps running

View process

?

Bin Sbin is very important (Bin, System Operation command, the command script is stored in two categories. CMD is a command under Windows; sbin System maintenance operation)

Share (Load jar package via path)

?

Logs files (learn to look at logs, analyze logs, log history logs, out is the current log)

Operation via HDFs

HDFs usage (brackets indicate optional, installation of Hadoop changed profiles can be placed separately in separate folders--config Confir is suitable for many changing environments, the working environment uses the configuration file under the default directory) space plus command

?

DFS is a command that runs a file system with supported file system support, and can refer to the Linux command

?

3.HDFS DFS Command

Visible Documents

Note: The user who started the process is super-user and can do anything

?

?

In version 2 (the script is not changed to prompt), do not care

Hadoop1 using Hadoop, (HADOOP2 version) DFS, execution results are the same

?

View directory Structure

?

Bin/hdfs dfs-ls hdfs://192.168.80.100:9000/

-ls followed by the HDFs access path ( set in the configuration file, can be followed by the hostname (performing hostname view, Port 9000 is before etc/hadoop/ Core-site.xml file configuration) or ip,9000 trailing slash indicates the root directory of HDFs (access Internet must pass protocol HTTP HTTPS FTP)

?

The LS command comes from

set the access path to HDFs in the configuration etc/hadoop/core-site.xml file

?

File system needs organizational structure, Linux is a tree structure, the root directory of HDFs (unlike Linux)

?

64-bit machine installed 32-bit Hadoop will have a warning message (learning does not affect, the enterprise uses 64-bit, 32-bit memory size limit, software best compiled by source code 64-bit) can be compiled through the source code Hadoop

OS 64 bit, Hadoop is 32 bit will prompt error

?

Very much like the Linux file structure

4. Common HDFS Commands

-Indicates no number of replicas, last access time

Size units are bytes (the directory itself does not contain data, so the size is 0), followed by an absolute path (how to know this information)

?

?

Refers to a folder

First-level view trouble

?

Note IS-R (recursively recursion using uppercase)

-d (normal format output)

?

?

-H (default is Byte, plus after display unit k,m based on size)

?

Shorthand, remove the hdfs://192.168.80.100:9000 (runtime is added to the runtime environment, it is to find the local in the presence of HDFs configuration file, if there is a specific value in the add-on Fs.defaultfs)

?

?

?

Differentiate by executing commands

?

?

Mistakenly assumed to be Linux (missing destination file)

-cat

File system is the intact access data, MySQL is a file management system,

-put

-CP (from HDFs to HDFs)

-copytofile) (HDFs copy to Linux)

-chmod

chmod 777

?

User/host, group, others (common permissions are folder 755, File 644is created by default)

?

?

?

LS has nothing to do with the default /user/root

There is no directory under HDFs

Directory of CurrentUser

?

?

?

?

Getting Started with HDFs (1)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.