Various classes of Hadoop and their role

Source: Internet
Author: User
Tags table definition

1. Basic packages (including toolkits and security packs)

Includes tools and security packs. which Hdfs.util contains some of the auxiliary data structures needed for HDFS implementations, and the Hdfs.security.token.block and hdfs.security.token.delegation, combined with Hadoop's security framework, provide a secure access to HDFS.

Hdfs.util (some HDFS implementations require a secondary data structure)

Atomicfileoutputstream.java----Inheritance Implementation class: Atomic file output stream class;
Datatransferthrottler.java----Independent Memory class: The Tuning parameter configuration table for data transfer; This class is thread-safe and can be shared by multiple threads;
Lightweightgset.java----Inheritance Implementation class: A low memory-occupied implementation class;

Hdfs.security.token.block (Secure access HDFS mechanism)

Blockkey.java----Inheritance implementation class: Key for generating and validating blocks of tokens;
Blocktokenidentifier.java----Inheritance Implementation class: The identifier of the block token;
Blocktokensecretmanager.java----Inheritance Implementation class: Block token management class, Blocktokensecretmanager can be instantiated into two modes, main mode and slave mode. The master node is able to generate a new block key and export the block key to the slave node. The slave node can only import and use Block key received from the master node. Both host and slave can generate and validate block tokens.
Blocktokenselector.java----Inheritance Implementation class: The block token selection for HDFs;
Exportedblockkeys.java----Inheritance Implementation class: Pass block Key object;
Invalidblocktokenexception.java----Inheritance Implementation class: access token validation failed;

Hdfs.security.token.delegation (Secure access HDFS mechanism)

Delegationtokenidentifier.java----Inheritance Implementation class: The identifier of a representative token specific to HDFs;
Delegationtokenrenewer.java----Inheritance Implementation class: This is a daemon, the implementation of waiting for the next file system continuation;
Delegationtokensecretmanager.java----Inheritance Implementation class: This class implements the management of a specific authorization token for HDFS, which implements the password to generate and accept each token;
Delegationtokenselector.java----Inheritance Implementation class: A token specifically for HDFS;

2.HDFS Entity Implementation Package

This is the focus of code analysis, which consists of 8 packages:

Hdfs.server.common includes features that are shared by name nodes and data nodes, such as system upgrades, storage space information, and so on.

Hdfs.protocol and Hdfs.server.protocol provide the definition and implementation of interfaces between various entities in HDFs that interact via IPC.

Hdfs.server.namenode, Hdfs.server.datanode, and HDFs respectively contain the implementation of the name node, data node, and client. The above code is the focus of HDFS code analysis.

Hdfs.server.namenode.metrics and Hdfs.server.datanode.metrics implement the collection of metric data on name nodes and data nodes. The metric data includes information such as the name node process and the count of events on the data node, such as the number of bytes written, the number of blocks copied, etc. on the data node.

Hdfs.server.common (functions shared by some name nodes and data nodes)

Generationstamp.java----Inheritance Implementation class: the ability to generate timestamps and read and write access classes;
Hdfsconstants.java----Interface class: A number of HDFs internal constants, HDFs constant fields and the definition of values;
Inconsistentfsstateexception.java----Inheritance Implementation class: Inconsistent file System State exception, file status check error message;
Incorrectversionexception.java----Inheritance Implementation class: Incorrect version exception; prompt for incorrect version checking;
Storage.java----Inheritance Implementation class: stores information files; Local storage information is stored in a separate version of the file; It contains the node type, storage layout version, namespace ID, and file system state creation time In-Memory records the current Namenode index information and status information (whether the file is open) in an extended table record;
Storageinfo.java----Independent Memory class: The general class of storing information, the basic table of file index information in memory, the basic function is to save the file system meta-information;
Upgradeable.java----Interface class: Generic interface for Distributed upgrade objects, object upgrade interface method set definition;
Upgrademanager.java----independent Memory class, Abstract: General upgrade management;
Upgradeobject.java----Inheritance Implementation class: Abstract upgrade object, including the implementation of common interface methods, upgradeable object interface method implementation;
Upgradeobjectcollection.java----Independent Memory class: A collection container for upgradeable objects, which should be registered before use;
Upgradestatusreport.java----Inheritance Implementation class: System upgrade base class, upgrade process status information table definition;
Util.java----Independent Memory class: Gets the current system time;

Hdfs.protocol (interface through IPC interaction between various entities in HDFs)

Alreadybeingcreatedexception.java----Inheritance Implementation class: File has been created exception;
Block.java----Inheritance Implementation class: The abstraction of data block in HDFs, Bolck memory basic block structure definition and read-write access, this class is the basis of a large number of classes related to data block, on the client interface, such classes have Locatedblock, Locateblocks and Blocklocalpathinfo.
Blocklistaslongs.java----Independent Memory class: This class provides an interface for accessing a block list, an array of bolck blocks, and the function of this class is to put the data in the block array Blcokarray "intact" into an array of type long blocklist;
Blocklocalpathinfo.java----Inheritance Implementation class: Local read optimization for data nodes used in the Clientdatanodeprotocol interface for HDFs read files. When the client discovers that it is on the same host as the block of data it is reading, it can read the local file directly, instead of reading the data block through the data node, to get the contents of the block. This greatly reduces the load on the corresponding data node.
Clientdatanodeprotocol.java----Interface class: The interface between the client and the data node. Used for client and data node interaction, this interface is used relatively little, the main interaction between the client and the data node is to read/write the file data through the stream interface operation. When an error occurs, the client needs data node mates for recovery, or when the client makes local file-read optimizations, it needs to get some information through the IPC interface.
Clientprotocol.java----Interface class: The interface between client and Namenode, is the gateway of the HDFS client to access the file system, the client accesses the name node through this interface, manipulate the metadata information of the file or directory, and the read-write file must first access the name node, then interact with the data node. manipulate the file data; In addition, from the name node can obtain some of the overall operational state information of the Distributed file system, also through this interface, for access to Namenode. It contains the HDFS functionality at the file angle. Like GFS, HDFs does not provide a POSIX-style interface, but instead uses a private interface. In general, programmers work with HDFs through Org.apache.hadoop.fs.FileSystem and do not need to use the interface directly.
Datanodeid.java----Inheritance Implementation class: Used to identify a data node in the HDFs cluster;
Datanodeinfo.java----Inheritance implementation class: Inherit from Datanodeid, on the basis of Datanodeid, provide some measure information on data node, Datanode state information structure definition and access read-write class;
Datatransferprotocol.java----Interface class: The implementation of streaming protocol transfer data between client and data node;
Directorylisting.java----Inheritance Implementation class: A property used to return multiple files/subdirectories in one directory at a time;
Dsquotaexceededexception.java----Inheritance Implementation class: Disk space exceeds the quota exception class;
Fsconstants.java----Interface class: some useful constants;
Hdfsfilestatus.java----Inheritance Implementation class: The properties of the HDFs file/directory are saved;
Layoutversion.java----Independent Memory class: This class tracks changes in the HDFs layout version;
Locatedblock.java----Inheritance Implementation class: Data blocks that have been identified for storage location, can be used to locate multiple data blocks at once;
Locatedblocks.java----Inheritance Implementation class: The location of the block and the set of file length, a set of data block and file length Description information table definition and read and write;
Nsquotaexceededexception.java----Inheritance Implementation class: The namespace exceeds the quota exception class;
Quotaexceededexception.java----Inheritance implementation class: exceeded quota exception;
Unregistereddatanodeexception.java----Inheritance Implementation class: Unregistered data node exception;

Hdfs.server.protocol (Implementation of interface between various entities in HDFs)

Balancerbandwidthcommand.java----Inheritance Implementation class: Administrators dynamically adjust balancer bandwidth parameters by calling "Dfsadmin-setbalanacerbandwidth Newbandwidth" ;
Blockcommand.java----Inheritance Implementation class: The Blockcommand class implements the instruction of the block under the control of the data node, the data Block command definition and the implementation class;
Blockmetadatainfo.java----Inheritance Implementation class: Metadata information of a block, definition and implementation of metadata information of data block;
Blockrecoveryinfo.java----Inheritance Implementation class: block recovery operation information;
Blockswithlocations.java----Inheritance Implementation class: The implementation class of the blocklocations sequence, and the reading and writing of the block information with position;
Datanodecommand.java----Abstract class: Data node command, data node basic information definition and implementation;
Datanodeprotocol.java----Interface class: Server indirect Port-interface between data node and name node. In the master-slave architecture of HDFS, the data node, as the slave node, constantly reports some information through this interface master node name node, synchronizes the information to the name node, and, at the same time, some methods of the interface, the return value of the method will bring back the name node instruction, according to these instructions, the data node or move, or delete, Or restore a block of data on the local disk, or perform other operations.
Datanoderegistration.java----Inheritance Implementation class: The Datanoderegistration class contains all the information of the name node recognition and validation data node, the data node registration information reading and writing method definition and implementation;
Disalloweddatanodeexception.java----Inheritance Implementation class: Data node exception not allowed;
Interdatanodeprotocol.java----Interface Classes: Interfaces between servers--interfaces between data nodes and data nodes. Data node through this interface, and other data nodes to communicate, restore data block, to ensure the consistency of data.
Keyupdatecommand.java----Inheritance Implementation class: Key upgrade command;
Namenodeprotocol.java----Interface class: The interface between the service period interface-the second name node, the HDFs equalizer, and the name node. The second name node will not stop to get the name of the node at a point in time the namespace image and mirror change log, and then merge to get a new image, and send the result back to the name node, in this process, the name node through this interface, with the second node to complete the merging of metadata. This interface also provides some information for the normal operation of the HDFs equalizer balancer.
Namespaceinfo.java----Inheritance Implementation class: The Namespaceinfo class implements the handshake that returns the name node for the data node;
Upgradecommand.java----Inheritance Implementation class: This is a generic distributed Upgrade command class; upgrade data block naming implementation;

Hdfs.server.namenode (Implementation of the name node)

Blockplacementpolicy.java----Abstract class: This interface is used to select the desired number of target disks to place a copy of the Block;
Blockplacementpolicydefault.java----Inheritance Implementation class: This class implements the required number of target disks that choose to place a copy of the Block;
Blockplacementpolicywithnodegroup.java----Inheritance Implementation class: This class implements the required number of target disks that choose to place a copy of the block on the Node-group layer;
Blockinfo.java----Independent Memory class: This class maintains a map of blocks to its metadata;
Canceldelegationtokenservlet.java----Inheritance Implementation class: Cancels the proxy token service;
Checkpointsignature.java----Inheritance Implementation class: Checkpoint signature class; The Signature information table definition of storage information;
Contentsummaryservlet.java----Inheritance Implementation class: File validation service;
Corruptreplicasmap.java----Independent Memory class: Information that stores all corrupted blocks in the file system;
Datanodedescriptor.java----Inheritance Implementation class: The Datanodedescriptor class tracks and counts the information on a given data node, such as available storage space, last update time, and so on, and defines and implements the state information of the data node;
Decommissionmanager.java----Independent Memory Class: Management node release;
Dfsservlet.java----Abstract class: The base class of the DFS service, and the web operation of the DFS proxy interface;
Editloginputstream.java----Abstract class: A generic abstract class used to support reading and editing log data from persistent storage; class method definition and implementation of reading log data;
Editlogoutputstream.java----Inheritance Implementation class: A generic abstract class used to support the editing of log data from persistent storage records, and the definition and implementation of class methods for writing log data;
Filechecksumservlets.java----Independent Memory class: File Check service, file is the proxy implementation of Web Operation command;
Filedataservlet.java----Inheritance Implementation class: The proxy implementation of the file data Web Operation command;
Fsckservlet.java----Inheritance Implementation class: The Web service of fsck on the name node, and the proxy implementation of the file system check Web Operation command;
Fsclusterstats.java----Interface class: This interface is used to retrieve cluster-related statistics;
Fsdirectory.java----Inheriting implementation class: Class Fsdirectory realizes the directory state of storing file system, the definition and implementation of file directory structure;
Fseditlog.java----Independent Memory class: The Fseditlog class implements the logging of maintaining namespace changes, and the definition of the File System log table;
Fsimage.java----Inheritance Implementation class: Fsimage implementation of the editing of the namespace checkpoint operation and logging operations, file system directory, files, data index and relationship information definition;
Fsinodeinfo.java----Interface class: File system related information;
Fsnamesystem.java----Inheritance Implementation class: The Fsnamesystem class realizes the actual bookkeeping for the data node, and defines the information structure named for the data node;
Fspermissionchecker.java----Independent Memory class: Implements a class for detecting file system permissions;
Getdelegationtokenservlet.java----Inheritance Implementation class: Gets the delegate token service;
Getimageservlet.java----Inheritance Implementation class: This class is used to retrieve files in the naming system, usually used for second-name nodes to retrieve images and to edit files for periodic checkpoints;
Host2nodesmap.java----Independent Memory class: host-to-node mapping;
Inode.java----Inheritance Implementation class: This abstract class contains the common fields of the file and directory index nodes, and the node basic information structure definition;
Inodedirectory.java----Inheritance Implementation class: The class that represents the index node of the directory;
Inodedirectorywithquota.java----Inheritance Implementation class: Directory index node class with quota limit;
Inodefile.java----Inheritance Implementation class: Directory index node file, file node information structure definition;
Inodefileunderconstruction.java----Inheritance Implementation class: Establish the Directory index node file, under the creation of the file node information structure definition;
Jsphelper.java----Independent Memory class: JSP implements Auxiliary class;
Leaseexpiredexception.java----Inheritance Implementation class: The file created is out of date exception;
Leasemanager.java----Independent Memory class: Leasemanager realizes the leasing management of the writing file; This class also provides a useful static method for lease recovery, the definition and implementation of contract information structure;
Listpathsservlet.java----Inheritance Implementation class: Gets the meta-information of a file system;
Metarecoverycontext.java----Independent Memory class: Context data for the name node recovery process in progress;
Namecache.java----Independent Memory class: caches frequently used names for reuse;
Namenode.java----Inheritance Implementation class: Name node function management and implementation class; The core server class of the name node;
Namenodefsck.java----Independent Memory class: This class provides the basic detection of Dfs volume; The system detection class of the name node;
Namenodemxbean.java----Interface class: This class is the JMX management interface of the name node information;
Notreplicatedyetexception.java----Inheritance Implementation class: The file has not been assigned a value exception class;
Pendingreplicationblocks.java----Independent Memory class: This class Pendingreplicationblocks implements all the fast copy records, and the information table definition of the data block is being copied;
Permissionchecker.java----Independent Memory class: This class implements the execution permission check operation, the permission check table structure definition and implementation;
Renewdelegationtokenservlet.java----Inheritance Implementation class: Renew token Service;
Safemodeexception.java----Inheritance Implementation class: This exception is thrown when the name node is in Safe mode, and the client is not able to modify the namespace until Safe mode is turned off;
Secondarynamenode.java----Inheritance Implementation class: The second name node function management and implementation class;
Serialnumbermanager.java----Independent Memory class: Manages the mapping of names to serial numbers for users and groups;
Streamfile.java----Inheritance Implementation class: the implementation of the stream file class;
Transferfsimage.java----Inheritance Implementation class: This class implements the function of obtaining a specified file from the name node, and obtains the image information of the file by HTTP;
Underreplicatedblocks.java----Inheritance Implementation class: The implementation of the class of the copy block, and the Block Information table definition after the copy is completed;
Unsupportedactionexception.java----Inheritance Implementation class: Exceptions that are not supported by operations;
Upgrademanagernamenode.java----Inheritance Implementation class: Name node upgrade management;
Upgradeobjectnamenode.java----Abstract Class: Name node object update class; Data node updates run on separate threads; Upgrade the object information of the name node;

Hdfs.server.datanode (Implementation of data nodes)

Blockalreadyexistsexception.java----Inheritance Implementation class: The target block already has an exception;
Blockmetadataheader.java----Independent Memory class: Data block header structure definition and implementation;
Blockreceiver.java----Inheritance Implementation class: This class implements the function of receiving a block and writing to its own disk, as well as copying it to another disk, data block receiving container information structure and implementation writing to the disk, receiving a data block and writing to the local disk, and may be copied to other nodes;
Blocksender.java----Inheritance Implementation class: reads a block from disk and sends it to the receiving destination, reads the data block from the disk and sends it to the corresponding receiver;
Blocktransferthrottler.java----Independent Memory class: Adjust the transmission of the data block, the adjustment parameter configuration table when the block transmits;
Datablockscanner.java----Inheritance Implementation class: Data block scanning Tool implementation, Datablockscanner has its own thread, can be timed from the current Datanode managed data block file verification; The most important way is Verifyblock;data Blockscanner other auxiliary methods are used to add/delete and sort operations to the data block file information of Datablockscanner management;
Datanode.java----Inheritance Implementation class: The function management and implementation of data node, the core manager of data block;
Datanodeblockinfo.java----Independent Memory class: This class is used for data nodes to maintain a mapping of data blocks to its metadata, and to establish a mapping between block files and which fsvolume they belong to;
Datanodemxbean.java----Interface class: The implementation of JMX management interface for data node information;
Datastorage.java----Inheritance Implementation class: Data storage information file;
Dataxceiver.java----Inheritance Implementation class: The thread used to process the input/output data stream;
Dataxceiverserver.java----Inheritance Implementation class: The service used to receive and send data blocks;
Fsdataset.java----Inheritance Implementation class: Fsdataset class implements the function of managing data block collection;
Fsdatasetasyncdiskservice.java----Independent Memory class: This class implements containers for multiple thread pools on each volume, so we can easily dispatch asynchronous disk operations, create a thread pool for each block directory, and use as a thread group a minimum value of 1 for the pool. The maximum value is 4, the current version, only the delete operation will be the task through the thread pool scheduling for asynchronous processing, read and write operations are synchronized to perform the file operation;
Fsdatasetinterface.java----Interface class: This is an interface that implements the underlying storage of a data node storage block; Fsdatasetinterface is the abstraction of datanode storage;
Securedatanodestarter.java----Inheritance Implementation class: This class implements a data node in a secure cluster, requires privileged resources before booting, and submits them to the data node;
Upgrademanagerdatanode.java----Inheritance Implementation class: This class realizes the upgrade management of data node;
Upgradeobjectdatanode.java----Abstract class: This class is the base class for the data node upgrade object, and the data node upgrade runs in a separate thread;

HDFs (client-side implementation)

Blockreader.java----Interface class: This interface for local and remote block read sharing;
Blockreaderlocal.java----Inheritance Implementation class: local block reads;
Byterangeinputstream.java----Abstract class: In order to support the HTTP byte stream, each time a new connection to the HTTP service needs to be established;
Checksumdistributedfilesystem.java----Inheritance Implementation class: Distributed File system detection;
Dfsclient.java----Independent Memory class: The Dfsclient class can connect to the Hadoop file system and perform basic file operations;
Dfsconfigkeys.java----Inheritance Implementation class: This class contains the constants used in HDFs;
Dfsutil.java----Independent Memory class: DFS utility;
Distributedfilesystem.java----Inheritance Implementation class: Abstract file system implementation of DFS system, implementation of distributed File system function;
Hftpfilesystem.java----Inheritance Implementation class: the implementation of the Protocol that accesses the file system via HTTP, and the HTTP protocol to access the HDFs file;
Hsftpfilesystem.java----Inheritance Implementation class: the implementation of the Protocol to access the file system over HTTPS, and the HTTPS protocol to access the HDFs file;
Leaserenewer.java----Independent Memory class: Update lease;

Hdfs.server.namenode.metrics (Collection of metrics data on the name node)

Fsnamesystemmbean.java----Interface class: This interface defines the method of acquiring the Fsnamesystem state of a name node, and the file status information of the name node;
Namenodeinstrumentation.java----Inheritance Implementation class: Name node specification class;

Hdfs.server.datanode.metrics (collection function of metric data on data node)

Datanodeinstrumentation.java----Inheritance Implementation class: Some specifications of data node;
Fsdatasetmbean.java----Interface class: This interface defines a method to obtain the Fsdataset state of a data node, and the function definition of a data set;

3. Application Package

including Hdfs.tools and Hdfs.server.balancer, these two packages provide the query HDFs status information tool Dfsadmin, The file System Checker implements the fsck and HDFs Equalizer balancer (initiated by start-balancer.sh).

Hdfs.tools (Query HDFs status information tool dfsadmin, file System Check tool FSCK implementation)

Delegationtokenfetcher.java----Independent Memory class: This class implements the acquisition of Delegationtoken from the current name node and stores it in the specified file;
Dfsadmin.java----Inheritance Implementation class: This class implements the provision of some DFS access management, the implementation of administrator commands;
Dfsck.java----Inheritance Implementation class: This class implements the basic check of Dfs volume, and the implementation of the file system check command;
Hdfsconcat.java----Independent Memory class: HDFs serial connection;

Hdfs.server.balancer (Implementation of HDFs equalizer balancer)

Balancer.java----Inheritance Implementation class: Load balancing process, each node based on his task volume balance; balance is a tool for balancing the disk space utilization on the HDFS cluster when some data nodes are in use or when new nodes are added to the cluster.

4.WebHDFS Related Packages

including Hdfs.web.resources, Hdfs.server.namenode.metrics.web.resources, Hdfs.server.datanode.web.resources and Hdfs.web A total of 4 packages.

Webhdfs is a new feature introduced in HDFs 1.0, which provides a complete mechanism for accessing HDFS over HTTP. In contrast to the read-only hftp file system, WEBHDFS provides the ability to read and write HDFs on HTTP and, based on this, implements the C client and user space file system (FUSE) that accesses HDFs.

Reference documents:

Hadoop Technology Insider--deep analysis of Hadoop Common and HDFS architecture design and implementation principles

Various classes of Hadoop and their role

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.