Getting Started with Hadoop learning notes---part3

Source: Internet
Author: User
Tags hadoop fs

2015 New Year's Day, study hard, day up. A good beginning is half the success, any learning can not be interrupted, only adhere to the results. Continue learning about Hadoop. ROME, Cold day!

Hadoop has a basic understanding of Hadoop through the construction of a pseudo-distributed cluster environment. But there are some theoretical things that need to be repeated in order to remember them thoroughly. Individuals believe that repetition is the mother of memory. Streamline:

    NameNode: manage the cluster and record the Datanode file information;

    Secondarynamenode: can do cold backup, to a certain range of data for snapshot backup;

    DataNode: storing data;

    Jobtracker: Manage tasks and assign tasks to Tasktracker;

    tasktracker: The execution side of the task.

HDFs now knows that it is a Hadoop distributed file system, but it is not known about other aspects of it, such as its architecture. Therefore, it is also essential to understand the architecture of the Hadoop Distributed file system and the underlying concepts. The key content of the Hadoop introductory learning Note---PART3 is the Distributed File system and HDFs,the shell operation of HDFs , thenamenode architecture ; The architecture of the Datanode .

    1. Distributed File System and HDFs:

DFS (Distributed File System) is a file system that allows files to be shared across multiple hosts on a network. Allows multiple users on multiple machines to share files and storage space.

HDFs is only one of the DFS, which is suitable for multiple queries at a time, does not support concurrent writes, and is not suitable for small files.

The following can be done in a built-in Hadoop pseudo-distributed environment. First look at whether the Hadoop process has started. If it does not start, you need to start and then do the following.

      #jps      #start-all.  SH   (if not started)

2. The shell operation of HDFs:

In fact, the shell operation of HDFs is basically similar to the operation on Linux. Just list some of the most commonly used commands, giving a useful role. To know what's going on and how to use it.

#hadoop fs–supportedls/ See what's in the root directory #hadoop FS–LSR/ recursively view the contents of the root directory #hadoop fs–supportedmkdir/Hello Create a new Hello folder under the root of HDFs #hadoop fs–put/root/test/Hello will upload the test file in the root directory of Linux to the Hello directory in HDFs, when there is only the source path and no destination path, the default indicates the file name, not the folder, for the name of the upload #hadoop fs–get /hello/test. download the file on HDFs to local. Note that at the very back of the command is a point, and this point is the local path, that is, the path to Linux, you can change the point to any path #hadoop fs–text/hello/Test to view the test file under the Hello directory directly on HDFs #hadoop fs–supportedRM/hello/Test Deletes the test file from the Hello directory, only for the file #hadoop fs–rmr/Hello recursively deletes the Hello directory on HDFs, including files and folders * * #hadoop Fs–help +command to view the Help document * * #hadoop fs–supportedls/It's actually an order .#hadoop fs–supportedlsHdfs://hadoop:9000/ is the same effect, that is shorthand. Note that Hadoop inside is the hostname of my machine and should be chosen according to your own reality  .

Because such commands are too many, I will not list them. As long as you use Linux commands, it's basically easy to get started. Just by analogy!

3. Namenode Architecture:

The two main cores of HDFs are Namenode and Datanode. is the entire file System Management node, maintaining the entire file System file directory tree, file/directory meta-information and each file corresponding to the data block list, receive the user's action request. I only summarize the summary, detailed introduction also please refer to the official documents.

Documents include:

(1) fsimage: File system image, metadata image file, store a period of time namenode memory meta-data information;

(2) edits: Operation log file, transaction file;

(3) fstime: Save the last checkpoint time.

The above files are saved on Linux.

Secondarynamenode:

Download the metadata information (fsimage and edits) from the Namenode, then merge the two, generate a new fsimage, save it locally, push it to Namenode, and reset Namenode edits. is actually cold backup.

The paths in Linux are as follows, and you can see the files described above.

 4. Datanode Architecture:

A storage service that provides real-world file data; You have to understand a key term: block, the most basic storage unit, and for file memory, the length of a file asks size. Then, starting from the 0 offset of the file, the file is divided and numbered according to the fixed size, and each block is divided into two blocks.

The default block size of HDFs is 64MB, with a 256MB file as an example, 256mb/64mb=4 blocks.

Unlike the normal file system, HDFs, if the file is smaller than the size of a block of data, does not occupy the entire block of storage space. That is: HDFs datanode when storing data, if the original file size is larger than 64MB, according to 64MB size, if less than 64MB, the actual size is saved.

Repication: Multiple copies, default to 3, stored on different machines.

The actual storage in Linux is shown. You can also see the meta-information that stores the data.

    

In the Hadoop introductory learning Note---part4, you'll use Java to manipulate HDFs to see how the Java-implemented application operates.

itred         Email:            Blog:http://www.cnblogs.com/itred           personal website:/http wangxingyu.jd-app.com* * * Copyright: This article copyright belongs to the author and blog Park, Welcome to reprint, but please mark the article in a conspicuous position. I reserve all rights to be held accountable for his use without my written consent. 

Getting Started with Hadoop learning notes---part3

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.