Deletes a task that matches a specified string from hadoop.
We all know that
hadoop job -list
Obtain the currently running hadoop task. The returned results are as follows:
The returned task does not contain the task name, but the job name can be viewed on the hadoop Management page.
But the reality is that we may need to delete the task according to the task name.
My implementation solution is as follows:
1. Obtain the http: // 192.168.1.100: 50030/jobtracker. jsp webpage
2. parse the webpage to obtain the task list with the task name + job_id
3. filter out the job with the specified name
4. Finally, call hadoop job-kill <job_id> to kill the task.
The Code is as follows:
Parse. py uses the html parsing module provided by python
from HTMLParser import HTMLParserclass MyHTMLParser(HTMLParser): def __init__(self): HTMLParser.__init__(self)# self.current_tag = None self.flag = False self.name_flag = False def handle_starttag(self, tag, attrs): if tag == 'td': for name,value in attrs: if name == 'id' and value.startswith("job_"): self.flag = True self.name_flag = False break elif name == 'id' and value.startswith("name_"): self.flag = True self.name_flag = True break def handle_endtag(self, tag): self.flag = False def handle_data(self, data): if self.flag: print data, if self.name_flag: print ' 'if __name__ == '__main__': fp = open("./jobtracker.jsp") data = fp.read() my = MyHTMLParser() my.feed(data)
Main Program kill_job.sh
Implement with shell
# Filter the keyword = $ 1if [-z "$ keyword"]; then echo "parameter cannot be blank" echo "Usage: bash kill_job.sh <keyword> "exitficurl-O http: // 192.168.1.100: 50030/jobtracker. jsppython parse. py | grep $ keyword | sort | tee job. tmpecho "---------------- start kill -----------------" # execute the delete action cat. /job. tmp | sort | while read LINEdo # echo $ LINE job_id = 'echo $ LINE | awk-F "" '{print $1}' | tr-d ''' echo "kill job -- $ {job_id} "hadoop job-kill" $ job_id "done
Execution method:
bash kill_job.sh merge_sl
Merge_sl is the name of the job.