A few days ago, DW user feedback, in a table (Rcfile table) with "Insert Overwrite table partition (XX) Select ..." When inserting data, duplicate files are generated. Looking at the job log, we found that map task 000005 had two task attempt, the second attempt was speculative execution, and the two attemp renamed the temp file as an official file in the task close function, Rather than through the two-phase commit protocol of the MapReduce framework (two phrase commit protocol) commit when Tasktracker is received committaskaction. Task to ensure that only one attemp result becomes the official result.
The output in the task log is as follows:
attempt_201304111550_268224_m_000005_0 renamed Path hdfs://10.2.6.102/tmp/hive-deploy/hive_2013-05-30_ 10-13-59_124_8643833043783438119/_task_tmp.-ext-10000/hp_cal_month=2013-04/_tmp.000005_0 to hdfs://10.2.6.102/ Tmp/hive-deploy/hive_2013-05-30_10-13-59_124_8643833043783438119/_tmp.-ext-10000/hp_cal_month=2013-04/000005_0 . File size is 666922 attempt_201304111550_268234_m_000005_1 renamed Path Hdfs://10.2.6.102/tmp/hive-deploy/hive_ 2013-05-30_10-13-59_124_8643833043783438119/_task_tmp.-ext-10000/hp_cal_month=2013-04/_tmp.000005_1 to hdfs:// 10.2.6.102/tmp/hive-deploy/hive_2013-05-30_10-13-59_124_8643833043783438119/_tmp.-ext-10000/hp_cal_month= 2013-04/000005_1. The File size is 666922
In fact, this hive statement would have only 1 jobs (launching Job 1 out of 1), and when the first Job ends, a conditional task analyzes the average file size under each partition, If it is less than hive.merge.smallfiles.avgsize (the default is 16MB), the first job is the map-only job and the Hive.merge.mapfiles is turned on (the default is true). A second merge-file job is used to merge the small file, and the Rcfilemergemapper is the small file that was generated before the merge. There are two ways to workaround, one is to turn off speculative execution, but there may be a task that is slow to cause a bottleneck, and the other is to close the merge file job (set Hive.merge.mapfiles=fasle). This does not use Rcfilemergemapper, but this creates a large number of small files that cannot be merged.
If parsing is to start the merge job, create a blockmergetask (inherit from Task), execute the inside Execute method, set the jobconf corresponding parameters, such as Mapred.mapper.class, Hive.rcfile.merge.output.dir, and then creates a jobclient and submitjob,map execution logic in the Rcfilemergemapper class, which inherits the old Mapred The abstract class Mapreducebase in the API overrides the Configure and close methods, and the rename operation mentioned earlier is that in the Close method, the Run method in the Maprunner class loops through the map method that actually executes the mapper. And finally call the Mapper Close method
public void Close () throws IOException {
//close writer
if (outwriter = = null) {return
;
}
Outwriter.close ();
Outwriter = null;
if (!exception) {
Filestatus FSS = Fs.getfilestatus (Outpath);
Log.info ("renamed path" + Outpath + "to" + Finalpath
+). File size is "+ fss.getlen ());
if (!fs.rename (Outpath, Finalpath)) {
throw new IOException ("Unable to rename output to" + Finalpath);
}
} else {
if (!autodelete) {
fs.delete (Outpath, True);
}
}
}
After the job execution is complete, it is possible to have different attempt of the same task to produce the result file at the same time, but Hive obviously take this into account, So the Rcfilemergemapper.jobclose method is called after the merge job executes, it backs up the output directory, writes the data to the output directory, and calls the Utilities.removetemporduplicatefiles method to delete the duplicate file, and the deletion logic is Extract TaskID from the filename, if the same taskid has two files, then the small one will be deleted, but in the 0.9 version, Rcfilemergemapper for the target table is dynamic partition table situation does not support, so there will be duplicated files, Patch (HTTPS://ISSUES.APACHE.ORG/JIRA/BROWSE/HIVE-3149?ATTACHMENTORDER=ASC) to solve the problem
Rcfilemergemapper Execute method Finally processing logic, the source code catch live exception after no processing, I added some stack trace output and set return value
Finally {
try {
if (ctxcreated) {
ctx.clear ()
}
if (RJ!= null) {
if (returnval!= 0) {
rj.killjob ();
}
HadoopJobExecHelper.runningJobKillURIs.remove (Rj.getjobid ());
Jobid = Rj.getid (). toString ();
Rcfilemergemapper.jobclose (OutputPath, Success, job, console, Work.getdynpartctx ());
catch (Exception e) {
console.printerror ("Rcfile merger Job close Error", "n"
+ Org.apache.hadoop.util.StringUtils.stringifyException (e));
E.printstacktrace (System.err);
Success = false;
ReturnVal = -500;
}
}