Configuring Hadoop under Windows
Hadoop installation package Decompression, path does not have special characters
Lib and bin extracted directly from the unusable, need to recompile themselves
Configure environment variables: Hadoop_home,path add: Bin directory
Namenode
- The management node for the entire file system. It maintains a file directory tree for the entire file system, meta-information for the file/directory, and a list of data blocks for each file. Receives the user's action request.
In response to client requests, upload files:
Client requests to upload a file, Namenode view metadata information to see if the path to the clients request already exists
Namenode returns the available Datanode
The client accesses the first Datanode directly, uploads the first Block,datanode to report block information to the Namenode, the first block establishes a pipeline, copies the block copy to the other Datanode, and links down the replicas To reach the configured number of replicas.
Namenode Writing meta data
In memory: Meta.data
Disk: Fsimage, edits log
Modify edits first
Sync to Meta.data
Fsimage structure:
File name, number of copies, Blockid,block stored machine
NameNode (FileName, replicas, block-ids,id2host ...)
/test/a.log, 3, {blk_1,blk_2},
[{BLK_1:[H0,H1,H3]},{BLK_2:[H0,H2,H4]}]
Secondary Namenode Sync Modify Fsimage
- Notifies Namendoe to switch edits and no longer uses the previous edits file
- Secondary Namenode download edits and fsimage files from Namenode
- Secondary Namenode loads them into memory, merges them, and generates a new Fsimage.chkpoint
- Upload the new Fsimage file back to Namenode
- Namenode replaces the old fsimage with the new fsimage
Checkpoint
FS.CHECKPOINT.PERIOD Specifies the maximum time interval for the checkpoint of two times, which defaults to 3,600 seconds.
Fs.checkpoint.size
Specifies the maximum value of the edits file, which, once exceeded, forces checkpoint, regardless of whether the maximum time interval is reached. The default size is 64M.
Inter-node communication:
- Remote method call RPC
- Transmission of large data volumes
FileSystem acquisition Process
- Filesystem.get (New URI (Hdfs_path), New Configuration ());//Get File Object
- Cache.get (URI, conf)//get from cache map
- fs = Createfilesystem (URI, conf);//Create a new FS
- Clazz = Getfilesystemclass (Uri.getscheme (), conf);//Get FS Class
- Reflectionutils.newinstance (clazz, conf)//Instantiate FS
- Fs.initialize (URI, conf);//Initialize FS parameters
- DFS = new Dfsclient (URI, conf, statistics)//Get DFS Client
- Proxyinfo =
Namenodeproxies.createproxywithlossyretryhandler (Conf,namenodeuri,
Clientprotocol.class, Numresponsetodrop)//client proxy object for nn communication via RPC
- This.namenode = Proxyinfo.getproxy ()//Get Namenode proxy Object
FS holds dfsclinet DFSC object in Distributedfilesystem Dfs,dfs, DFSC holds Namenode proxy object
Big Data Learning Note 2--hdfs Working principle and source Code analysis