HDFS-hadoop Distributed File System

Source: Internet
Author: User

The most important file system of hadoop is the filesystem class, and its two subclasses localfilesystem and distributedfilesystem. Here, we analyze filesystem first.
Abstract class filesystem, which improves a series of interfaces for file/directory operations. There are also some auxiliary methods. Description:
1. Open, create, delete, rename, etc., non-abstract, some return fsdataoutputstream for stream processing.
2. openraw, createraw, renameraw, deleteraw, etc. Abstract, partial return fsinputstream, which can be accessed randomly.
3. locks, release, copyfromlocalfile, movefromlocalfile, copytolocalfile, and other abstract methods provide convenience. The functions can be seen from the method naming.
Note: In the hadoop file system, each file has a checksum and a CRC file. Therefore, some code in filesystem is specially processed, such as rename.
Localfilesystem and distributedfilesystem should be transparent to users. There is not much analysis here. It should be explained together with fsdatainputstream and fsinputstream.
Check the getfilecachehints method of the two sub-classes. We can see that localfilesystem is named by 'localhost'. It is estimated that both filesystems communicate data over the network, Internet, and Intranet.
Localfilesystem has two internal classes: localfsfileinputstream and localfsfileoutputstream. You can see that it is operated using filechannel. In addition, the lock and release methods use treemap to save the file and the corresponding lock.
Distributedfilesystem has less code than localfilesystem, but is more complex. It uses dfsclient for Distributed File System operations:
Public distributedfilesystem (inetsocketaddress namenode, configuration conf) throws ioexception
{
Super (CONF );
This. DFS = new dfsclient (namenode, conf );
This. Name = namenode. gethostname () + ":" + namenode. getport ();
}
The dfsclient class receives an inetsocketaddress and configuration as input, and encapsulates network transmission details. In distributedfilesystem, most methods call dfsclient for processing. It is only a warpper. The following describes dfsclient.
In dfsclient, RPC is primarily used for network communication, rather than directly using Socket internally. For more information about the transmission, see the three classes in the org. Apache. hadoop. IPC package.
The paths in dfsclient are basically utf8 types, rather than strings. In distributedfilesystem, getpath and getdfspath are used for conversion. This ensures the consistency between the path format standards and data transmission.
Most of the methods in dfsclient are also directly delegated to the clientprotocol type namenode for execution. Here we mainly analyze other methods.
Internal class of leasechecker. A daemon thread regularly performs renewlease operations on namenode. Note:
Client programs can cause stateful changes in the namenode that affect other clients. A client may obtain a file and neither abandon nor complete it. A client might hold a series of locks that prevent other clients from proceeding.Clearly, it wocould be bad if a client held a bunch of locks that it never gave up. this can happen easily if the client dies unexpectedly. so, the namenode will revoke the locks and live file-creates for clients that it thinks have died.A client tells the namenode that it is still alive by periodically calling renewlease (). if a certain amount of time passes since the last call to renewlease (), the namenode assumes the client has died.
The function is to monitor the client's heartbeat. If the client fails, unlock the client.
Dfsinputstream and dfsoutputstream are more complex than those in localfilesystem. They are also operated through clientprotocol, which uses Org. apache. hadoop. the data structures in the DFS package, such as datanode and block, are not analyzed here.

Analysis of filesystem (1) by now, I personally feel that its encapsulation is still doing well. After being separated from the nutch project, it is clearer than before. Next we will analyze the second part of mapreduce, from how mapreduce is distributed
 

######################################## #######################################

The previous mapreduce demo can only run on one machine. Now it is time to run it in a distributed manner. After a simple study of the mapreduce running process and filesystem, I try to start with the configuration to see how to make hadoop run mapreduce on both machines at the same time.
First, check back here.
String tracker = Conf. Get ("mapred. Job. Tracker", "local ");
If ("local". Equals (tracker )){
This. jobsubmitclient = new localjobrunner (CONF );
} Else {
This. jobsubmitclient = (jobsubmissionprotocol)
Rpc. getproxy (jobsubmissionprotocol. Class,
Jobtracker. getaddress (CONF), conf );
}
If the tracker address is not local, the tracker is the jobtracker class of the remote client.
JobtrackerThere is a main function. The comment shows that it is only used for debugging. Normally, it is run as part of the DFS namenode process. But here we can start from it.
Tracker = new jobtracker (CONF); // construct
The constructor first obtains the values of a bunch of constants, then clears 'systemdir', and then starts the RPC server.
Inetsocketaddress ADDR = getaddress (CONF );
This. localmachine = ADDR. gethostname ();
This. Port = ADDR. getport ();
This. intertrackerserver = rpc. getserver (this, ADDR. getport (), 10, false, conf );
This. intertrackerserver. Start ();
Start trackinfoserver:
This. infoport = Conf. getint ("mapred.job.tracker.info. Port", 50030 );
This. infoserver = new jobtrackerinfoserver (this, infoport );
This. infoserver. Start ();
TrackinfoserverJobtracker information is obtained through HTTP, which can be conveniently used to monitor the progress of work tasks.
Start three daemon threads:
New thread (this. expiretrackers). Start (); // used to expire tasktrackers that have gone down
New thread (this. retirejobs). Start (); // used to remove old finished jobs that have been around for too long
New thread (this. initjobs). Start (); // used to init new jobs that have just been created
The usage of the three threads has been commented out and will not be analyzed here. Next we will analyze jobtracker. submitjob ()
Localjobrunner. submitjob () has been analyzed before. it instantiates an internal class job and implements the mapreduce process in it. Jobtracker is more complex. It instantiates jobinprogress and submits the job to the queue:
Jobinprogress job = new jobinprogress (jobfile, this, this. conf );
Synchronized (jobs ){
Synchronized (jobsbyarrival ){
Synchronized (jobinitqueue ){
Jobs. Put (job. getprofile (). getjobid (), job );
Jobsbyarrival. Add (job );
Jobinitqueue. Add (job );
Jobinitqueue. policyall ();
}
}
}
At this time, the retirejobs thread starts to process the job with timeout and errors. The jobinitthread initializes the job: job. inittasks ();
Start AnalysisJobinprogress
In the constructor, tracker obtains the task file (XML and jar) from the initiator DFS, and then saves it to the local directory.
Jobconf default_job_conf = new jobconf (default_conf );
This. localjobfile = default_job_conf.getlocalfile (jobtracker. subdir,
Jobid + ". xml ");
This. localjarfile = default_job_conf.getlocalfile (jobtracker. subdir,
Jobid + ". Jar ");
Filesystem FS = filesystem. Get (default_conf );
FS. copytolocalfile (new file (jobfile), localjobfile );

Conf = new jobconf (localjobfile );
This. Profile = new jobprofile (Conf. getuser (), jobid, jobfile, URL,
Conf. getjobname ());
String jarfile = Conf. getjar ();
If (jarfile! = NULL ){
FS. copytolocalfile (new file (jarfile), localjarfile );
Conf. setjar (localjarfile. GetCanonicalPath ());
}

Pay attention to the jarfile, jobconf constructor:
Public jobconf (configuration Conf, class Aclass ){
This (CONF );
String jar = findcontainingjar (Aclass );
If (jar! = NULL ){
Setjar (jar );
}
}
If Aclass is in a jar, setjar (jar); will be executed, and the jar will be copied to the working directory of localjobrunner or jobtracker. Therefore, there is a principle:All the classes of the mapreduce operation to be executed are packaged into a jar so that distributed mapreduce computing can be executed..
Check jobinprogress. inittasks () again ()
First load inputformat from jar
String ifclassname = JD. Get ("mapred. Input. format. Class ");
Inputformat;
If (ifclassname! = NULL & localjarfile! = NULL ){
Try {
Classloader loader =
New urlclassloader (new URL [] {localjarfile. tourl ()});
Class inputformatclass = loader. loadclass (ifclassname );
Inputformat = (inputformat) inputformatclass. newinstance ();
} Catch (exception e ){
Throw new ioexception (E. tostring ());
}
} Else {
Inputformat = JD. getinputformat ();
}
Next, sort the file block size.
Create a map task
This. nummaptasks = splits. length;
// Create a map task for each split
This. Maps = new taskinprogress [nummaptasks];
For (INT I = 0; I <nummaptasks; I ++ ){
Maps = new taskinprogress (jobfile, splits, jobtracker, Conf, this );
}
Create a reduce task
This. Reduces = new taskinprogress [numreducetasks];
For (INT I = 0; I <numreducetasks; I ++ ){
Reduces = new taskinprogress (jobfile, maps, I, jobtracker, Conf, this );
}
Finally, cache the information of each split and create a status class.
For (INT I = 0; I <maps. length; I ++ ){
String hints [] [] = FS. getfilecachehints (splits. GetFile (), splits. getstart (), splits. getlength ());
Cachedhints. Put (maps. gettipid (), hints );
}

This. Status = new jobstatus (status. getjobid (), 0.0f, 0.0f, jobstatus. Running );
Now it's your turnTaskinprogressIt encapsulates the map and reduce operations in the job, but jobinprogress. inittasks () only initializes the task and does not execute the task. After some tracking, it is found that the task is executedTasktracker.
TasktrackerAnd implements the taskumbilicalprotocol interface. In the previous article, localjobrunner's internal class job also implemented this interface. Here we will compare it:
Interface jobsubmissionprotocol: localjobrunner <---> jobtracker
Interface taskumbilicalprotocol: localjobrunner. Job <---> tasktracker
Face-to-faceTasktrackerFirst, it starts from the main entry.
Tasktracker implements runnable. Main instantiates the tasktracker object and then runs the run () method.
In the constructor, Initialization is mainly performed.
This. mapoutputfile = new mapoutputfile ();
This. mapoutputfile. setconf (CONF );
Initialize ();
In initialize (), initialize some variable values and then initialize the RPC server:
While (true ){
Try {
This. taskreportserver = rpc. getserver (this, this. taskreportport, maxcurrenttasks, false, this. fconf );
This. taskreportserver. Start ();
Break;
} Catch (bindexception e ){
Log.info ("cocould not open Report Server at" + this. taskreportport + ", trying new port ");
This. taskreportport ++;
}

}
While (true ){
Try {
This. mapoutputserver = new mapoutputserver (mapoutputport, maxcurrenttasks );
This. mapoutputserver. Start ();
Break;
} Catch (bindexception e ){
Log.info ("cocould not open mapoutput server at" + this. mapoutputport + ", trying new port ");
This. mapoutputport ++;
}
}
Mapoutputserver uses a loop to try port binding.
Last sentence
This. jobclient = (intertrackerprotocol) RPC. getproxy (intertrackerprotocol. Class, jobtrackaddr, this. fconf );
There is a new interface intertrackerprotocol, which is used for communication between tasktracker and central jobtracker. Through this interface, tasktracker can be used to execute tasks in jobtracker. Next, we will analyze the main flow of taskserver, the run () function.
In run (), there are two while loops. In the internal while loop, execute the offerservice () method. It is also a while loop, and starts several pieces of code for jobtracker heartbeat monitoring. Next, it calls jobtracker through the Protocol interface to get the task and execute it:
If (maptotal <maxcurrenttasks | reducetotal <maxcurrenttasks ){
Task T = jobclient. pollfornewtask (tasktrackername );
If (T! = NULL ){
Taskinprogress tip = new taskinprogress (T, this. fconf );
Synchronized (this ){
Tasks. Put (T. gettaskid (), tip );
If (T. ismaptask ()){
Maptotal ++;
} Else {
Reducetotal ++;
}
Runningtasks. Put (T. gettaskid (), tip );
}
Tip. launchtask ();
}
}
Tip. launchtask (); start to execute this task, within the method:
This. Runner = task. createrunner (tasktracker. This );
This. Runner. Start ();
Tasks have two subclasses: maptask and reducetask. Both the createrunner () method creates a subclass of taskrunner. taskrunner inherits the attributes of thread and run:
String Sep = system. getproperty ("path. separator ");
File workdir = new file (T. getjobfile (). getparent (), "work ");
Workdir. mkdirs ();

Stringbuffer classpath = new stringbuffer ();
// Start with same classpath as parent process
Classpath. append (system. getproperty ("Java. Class. Path "));
Classpath. append (SEP );
Jobconf job = new jobconf (T. getjobfile ());
String jar = job. getjar ();
If (jar! = NULL) {// If jar exists, it into workdir
Unjar (new file (jar), workdir );
File [] libs = new file (workdir, "lib"). listfiles ();
If (Libs! = NULL ){
For (INT I = 0; I <libs. length; I ++ ){
Classpath. append (SEP); // Add libs from jar to classpath
Classpath. append (libs );
}
}
Classpath. append (SEP );
Classpath. append (new file (workdir, "classes "));
Classpath. append (SEP );
Classpath. append (workdir );
}

Obtain the working directory and classpath. Decompress the jar package of the job.
// Build exec child jmv args.
Vector vargs = new vector (8 );
File JVM = // Use same JVM as parent
New file (new file (system. getproperty ("Java. Home"), "bin"), "Java ");

Vargs. Add (JVM. tostring ());
String javaopts = handledeprecatedheapsize (
Job. Get ("mapred. Child. java. opts", "-xmx200m "),
Job. Get ("mapred. Child. Heap. Size "));
Javaopts = replaceall (javaopts, "@ taskid @", T. gettaskid ());
Int Port = job. getint ("mapred. task. tracker. Report. Port", 50050) + 1;
Javaopts = replaceall (javaopts, "@ port @", integer. tostring (port ));
String [] javaoptssplit = javaopts. Split ("");
For (INT I = 0; I <javaoptssplit. length; I ++ ){
Vargs. Add (javaoptssplit );
}

// Add classpath.
Vargs. Add ("-classpath ");
Vargs. Add (classpath. tostring ());
// Add main class and its arguments
Vargs. Add (tasktracker. Child. Class. getname (); // main of child
Vargs. Add (tracker. taskreportport + ""); // pass umbilical port
Vargs. Add (T. gettaskid (); // pass task identifier
// Run Java
Runchild (string []) vargs. toarray (New String [0]), workdir );
Here we construct the classpath and other VM parameters for starting the Java Process, and finally open a sub-process in runchild to execute this task. It feels complicated.
Finally, the internal class child of tasktracker is analyzed. It is the class executed by the face-on process. In the main function
Taskumbilicalprotocol umbilical =
(Taskumbilicalprotocol) RPC. getproxy (taskumbilicalprotocol. Class,
New inetsocketaddress (port), conf );

Task task = umbilical. gettask (taskid );
Jobconf job = new jobconf (task. getjobfile ());

Conf. addfinalresource (new file (task. getjobfile ()));
The sub-process also communicates with tasktracker through rpc.
Startpinging (umbilical, taskid); // start pinging parent
Start a process to monitor the heartbeat of tasktracker.
String workdir = job. getworkingdirectory ();
If (workdir! = NULL ){
Filesystem file_sys = filesystem. Get (job );
File_sys.setworkingdirectory (new file (workdir ));
}
Task. Run (job, umbilical); // run the task
The task is actually started here.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.