Big Data Learning Note 2--hdfs Working principle and source Code analysis

Source: Internet
Author: User

Configuring Hadoop under Windows

    1. Hadoop installation package Decompression, path does not have special characters

    2. Lib and bin extracted directly from the unusable, need to recompile themselves

    3. Configure environment variables: Hadoop_home,path add: Bin directory

Namenode

    • The management node for the entire file system. It maintains a file directory tree for the entire file system, meta-information for the file/directory, and a list of data blocks for each file. Receives the user's action request.

In response to client requests, upload files:

    1. Client requests to upload a file, Namenode view metadata information to see if the path to the clients request already exists

    2. Namenode returns the available Datanode

    3. The client accesses the first Datanode directly, uploads the first Block,datanode to report block information to the Namenode, the first block establishes a pipeline, copies the block copy to the other Datanode, and links down the replicas To reach the configured number of replicas.

Namenode Writing meta data

    • In memory: Meta.data

    • Disk: Fsimage, edits log

    • Modify edits first

    • Sync to Meta.data

      Fsimage structure:

    • File name, number of copies, Blockid,block stored machine

    • NameNode (FileName, replicas, block-ids,id2host ...)

    • /test/a.log, 3, {blk_1,blk_2},
      [{BLK_1:[H0,H1,H3]},{BLK_2:[H0,H2,H4]}]

Secondary Namenode Sync Modify Fsimage

    1. Notifies Namendoe to switch edits and no longer uses the previous edits file
    2. Secondary Namenode download edits and fsimage files from Namenode
    3. Secondary Namenode loads them into memory, merges them, and generates a new Fsimage.chkpoint
    4. Upload the new Fsimage file back to Namenode
    5. Namenode replaces the old fsimage with the new fsimage

Checkpoint

    • FS.CHECKPOINT.PERIOD Specifies the maximum time interval for the checkpoint of two times, which defaults to 3,600 seconds.

    • Fs.checkpoint.size
      Specifies the maximum value of the edits file, which, once exceeded, forces checkpoint, regardless of whether the maximum time interval is reached. The default size is 64M.

Inter-node communication:

    • Remote method call RPC
    • Transmission of large data volumes

FileSystem acquisition Process

    1. Filesystem.get (New URI (Hdfs_path), New Configuration ());//Get File Object
    2. Cache.get (URI, conf)//get from cache map
    3. fs = Createfilesystem (URI, conf);//Create a new FS
    4. Clazz = Getfilesystemclass (Uri.getscheme (), conf);//Get FS Class
    5. Reflectionutils.newinstance (clazz, conf)//Instantiate FS
    6. Fs.initialize (URI, conf);//Initialize FS parameters
    7. DFS = new Dfsclient (URI, conf, statistics)//Get DFS Client
    8. Proxyinfo =
      Namenodeproxies.createproxywithlossyretryhandler (Conf,namenodeuri,
      Clientprotocol.class, Numresponsetodrop)//client proxy object for nn communication via RPC
    9. This.namenode = Proxyinfo.getproxy ()//Get Namenode proxy Object

FS holds dfsclinet DFSC object in Distributedfilesystem Dfs,dfs, DFSC holds Namenode proxy object

Big Data Learning Note 2--hdfs Working principle and source Code analysis

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.