The mapreduce program we write is not necessarily efficient. We need to determine where the mapreduce bottleneck is. The hadoop Framework provides support for hprof. hprof can track CPU usage, heap usage, and thread lifecycles, which can be of great help for determining program bottlenecks.
To use hprof, we need to make some settings in jobconf. The specific operations are as follows:
Jobconf = new jobconf (CONF );
Jobconf. setprofileenabled (true); // enable hprof
Jobconf. setprofileparams ("-agentlib: hprof = depth = 8, CPU = sampl es, heap = sites, force = N," + "thread = Y, verbose = n, file = % s ");
// The value 8 indicates the stack call depth, which can be specified by the user.
Jobconf. setprofiletaskrange (true, "0-5 ");
// ID of the map task whose profile is required
Jobconf. setprofiletaskrange (false, "0-5 ");
// ID of the reduce task whose profile is required
With the above settings, you can enable hprof. After the mapreduce job is complete, you can view multiple profile files in the directory where the job is submitted. Open each profile file. At the bottom of the file, we can see the two most important statistics: the memory occupied by the object and the time consumed by stack calls.
Memory Object statistics and stack calls that generate this object (sorted in ascending order)
Sort the time consumed by stack calls (sorted from large to small)
The following figure shows the top 303703 stack calls. The order of function calls is displayed, as shown in: