Because the disk of a server in the hadoop cluster is damaged, the failure rate of the tasktracker task on the server increases greatly (cause of failure: the temporary directory of the task assigned to the server selects the damaged disk, job initialization fails.) Therefore, the system decides to delete the bad disk from the mapred local directory in tasktracker and then restart tasktracker.
The procedure is as follows:
1) After modifying the mapred-site.xml configuration file;
2) Restart tasktracker;
3) Track tasktracker logs to ensure that tasktracker can be started properly
When tracking logs, it is found that tasktracker is in the "suspended" state after it is started. The specific symptoms are as follows:
1) After tasktracker logs are output to the following statements, no output is made.
Org. Apache. hadoop. mapred. tasktracker: Starting tasktracker with owner as hadoop
2) The server does not exist in the jobtracker active tasktracker list.
The problem cannot be solved by restarting tasktracker, and no answers can be found online.Source codeTo see what tasktracker is doing after entering this log?
Hadoop tasktracker SourceCodePath:/opt/modules/hadoop/hadoop-0.20.203.0/src/mapred/org/Apache/hadoop/mapred/tasktracker. Java
The operation code of tasktracker after output startup information is as follows: (Note the yellow mark)
It turns out that tasktracker clears temporary files in the local running directory every time it starts. When tasktracker runs for a long time, many files and folders are accumulated in the local running directory, as a result, tasktracker takes a long time to delete the file at startup. After the temporary files in the local directory of mapred are manually cleared, tasktracker starts successfully.
#-*-Coding: UTF-8-*-import osimport timelocal_dir = ["/opt/data/hadoop/mapred/mrlocal ", "/opt/data/hadoop1/mapred/mrlocal", "/opt/data/hadoop2/mapred/mrlocal", \ "/opt/data/hadoop3/mapred/mrlocal ", "/opt/data/hadoop4/mapred/mrlocal"] def clean (path, tolerable_time ): clean_dir = [] real_path = path + '/tasktracker' clean_dir.append (real_path + "/distcache") for F in OS. listdir (real_path): If F! = "Distcache": clean_dir.append (real_path + '/' + F + "/distcache") for c_dir in clean_dir: For f in OS. listdir (c_dir): last_mtime = OS. stat ("% S/% s" % (c_dir, f )). st_mtime if last_mtime <tolerable_time: Print c_dir, f OS. system ("RM-RF % S/% s" % (c_dir, f) If _ name __= = "_ main _": cur_time = int (time. time () tolerable_time = cur_time-3*24*3600 for Path in local_dir: Try: Clean (path, tolerable_time) limit T: continue