Hadoop O & M record 2: "false dead" after tasktracker is started

Source: Internet
Author: User

Because the disk of a server in the hadoop cluster is damaged, the failure rate of the tasktracker task on the server increases greatly (cause of failure: the temporary directory of the task assigned to the server selects the damaged disk, job initialization fails.) Therefore, the system decides to delete the bad disk from the mapred local directory in tasktracker and then restart tasktracker.

The procedure is as follows:

1) After modifying the mapred-site.xml configuration file;

2) Restart tasktracker;

3) Track tasktracker logs to ensure that tasktracker can be started properly

When tracking logs, it is found that tasktracker is in the "suspended" state after it is started. The specific symptoms are as follows:

1) After tasktracker logs are output to the following statements, no output is made.

 
Org. Apache. hadoop. mapred. tasktracker: Starting tasktracker with owner as hadoop

2) The server does not exist in the jobtracker active tasktracker list.

The problem cannot be solved by restarting tasktracker, and no answers can be found online.Source codeTo see what tasktracker is doing after entering this log?

Hadoop tasktracker SourceCodePath:/opt/modules/hadoop/hadoop-0.20.203.0/src/mapred/org/Apache/hadoop/mapred/tasktracker. Java

The operation code of tasktracker after output startup information is as follows: (Note the yellow mark)

It turns out that tasktracker clears temporary files in the local running directory every time it starts. When tasktracker runs for a long time, many files and folders are accumulated in the local running directory, as a result, tasktracker takes a long time to delete the file at startup. After the temporary files in the local directory of mapred are manually cleared, tasktracker starts successfully.

#-*-Coding: UTF-8-*-import osimport timelocal_dir = ["/opt/data/hadoop/mapred/mrlocal ", "/opt/data/hadoop1/mapred/mrlocal", "/opt/data/hadoop2/mapred/mrlocal", \ "/opt/data/hadoop3/mapred/mrlocal ", "/opt/data/hadoop4/mapred/mrlocal"] def clean (path, tolerable_time ): clean_dir = [] real_path = path + '/tasktracker' clean_dir.append (real_path + "/distcache") for F in OS. listdir (real_path): If F! = "Distcache": clean_dir.append (real_path + '/' + F + "/distcache") for c_dir in clean_dir: For f in OS. listdir (c_dir): last_mtime = OS. stat ("% S/% s" % (c_dir, f )). st_mtime if last_mtime <tolerable_time: Print c_dir, f OS. system ("RM-RF % S/% s" % (c_dir, f) If _ name __= = "_ main _": cur_time = int (time. time () tolerable_time = cur_time-3*24*3600 for Path in local_dir: Try: Clean (path, tolerable_time) limit T: continue

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.