Hadoop program printing and debugging

Source: Internet
Author: User
Document directory
  • 1. system. Out and system. Err
  • 2. counters
  • 3. Set the status
  • 4. Use the output file to output debugging information
  • Reference
Reprint hadoopProgram printing and debugging 1. system. Out and system. errmain

InMainUse in FunctionsSystem.outStandard output andSystem.errStandard Error output: the output is directed to the terminal of the execution program node. That is, in the fully distributedHadoopIn deployment, the output is directed to the terminal of the node where the program is started.

Mapper

For each split,MapperObject as a task, inHadoopRun the command on a map slot in a machine in the cluster.MapperObjectmapFunctions andconfigureThe function is called by the main program in RPC mode. ThereforeMapperObjectSystem.outAndSystem.errThe output result is not directed to the terminal of the execution program node. In this case, there are only two ways to view mapper output:

  1. On the jobtracker web page, the default port is bound to port 50030. Enter the job details page according to jobid, and then enter the map details page. You can see the list of map tasks executed by this job. Go to any one and view the task logs. You can see the stdout logs and stderr logs of the corresponding map task, which areSystem.outAndSystem.errOutput content.

  2. Enter the node that executes a map task<mapred.local.dir>/userlogs/job_<jobid>/attempt_<taskid>/In the directory, the stdout and stderr files areSystem.outAndSystem.errOutput content. This method requires the access permission of the node host, which is not very convenient.

Reducer

ReducerAndMapperSimilarMapper.

Eclipse

Debug the program in a pseudo-distribution environment built using eclipse and locally,System.outAndSystem.errThe output will be directed to the eclipse terminal. However, in the local pseudo-distribution environment, onlymain. Maybe eclipseHadoopThe plug-in uses a method that has not been analyzed yet.

2. counters

For large distributed jobs, it is easier to use counters because:

  1. Obtaining counter values is more convenient than outputting logs.
  2. Counting the occurrence times of a specific event based on the counter value is much easier than analyzing a pile of log files.

The counter is maintained by its associated tasks, regularly transmitted to tasktracker, and then to jobtracker. Therefore, counters can be globally aggregated. The partial counter value of a task is a complete report every time, instead of reporting the current Increment Based on the previous report, so as to avoid errors caused by message loss. In addition, when a task fails, the Global Counter minus the value of the local counter of the failed task. When a task is successfully executed, its local counter value is complete and reliable.

HadoopMaintains severalBuilt-in countersTo describe the indicators of the job.

User-Defined counters

You can write a program to define your own counters.mapperAndreducer. A group of counters can be defined by the enumeration type. The Enumeration type name is the group name. Several counters in the group can be defined by enumerated fields. The field name is the counter name.

A set of counters are defined globally as follows:TemperatureThe Group has two counters.MISSINGAndMALFORMED:

enum Temperature {    MISSING,    MALFORMED}

ThenmapperOrMALFORMEDAvailable inReporterOf

public void incrCounter(String group, String counter, long amount)

Method To increase the counter value:

reporter.incrCounter(Temperature.MISSING, 1);reporter.incrCounter(Temperature.MALFROMED, 1);
Dynamic counter

Dynamic counters do not need to be pre-defined by Enumeration type, but only need to dynamically create counters during execution. You only need to useReporterOf

public void incrCounter(String group, String counter, long amount)

Method.

Counter value acquisition

InHadoopWhen the job is executed,mapperAndreducerAvailableReporterTo obtain the current counter.

public Counter getCounter(Enum<?> name);public Counter getCounter(String group, String name);

InHadoopAfter the job is executed, the terminal returns the value of each counter of the program. In addition, there are two ways to get the counter value:

  1. View the counter value of each task in each job and job on the jobtracker web page.

  2. Run the following command to obtain the counter value:

    hadoop job -counter [-counter   ]
3. Set the status

InMapAndReduceAvailablereporter.setStatus("Error")Set the status of each task, which can be viewed in the jobtracker web.

4. Use the output file to output debugging information

Another method is to useReducerUses the key to differentiate the normal output and debugging output, and outputs the debugging information to the file. For details, refer to 3.

Reference
  1. Hadoop: The definitive guide
  2. Hadoop-Mapreduce Process
  3. HadoopDebugging information output

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.