Recently there is a SQL run time longer than two hours, so prepare to optimize the next
First look at the counter data discovery for the hive SQL generation job
total of CPU Time Spent too high estimate 100.4319973 hours
650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M02/6D/74/wKioL1VlFr6hWeoTAAF9ENBUpKY216.jpg "style=" float: none; "title=" 1.jpg "alt=" Wkiol1vlfr6hweotaaf9enbupky216.jpg "/>
each Map of the CPU Time Spent
650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M02/6D/74/wKioL1VlFr7AJlcQAALkdoRGTVE652.jpg "title=" 2.jpg " Style= "Float:none;" alt= "wkiol1vlfr7ajlcqaalkdorgtve652.jpg"/>
It's a first-line consumption. 2.0540889 hours
It is recommended to set the following parameters:
1 , mapreduce.input.fileinputformat.split.maxsize now Span style= "font-family:arial, ' Sans-serif '; color: #222222;" >256000000  &NBSP, downwards add map Number (This move immediately, I set as 32000000 produced a map of the architect, The last task is up to 47 minutes from the original 2 hours to complete )
2 , optimize UDF Getpageid Getsiteid Getpagevalue (These methods use a number of regular expressions for text matching)
2.1 Regular expression processing optimization can refer to the
http://www.fasterj.com/articles/regex1.shtml
http://www.fasterj.com/articles/regex2.shtml
2.2 UDF optimization See
1 also you should use class level privatete members to save on object incantation and garbage collection.2 you also get benefits by matching the args with what you would normally expect from upstream. Hive converts text to string when Needed, but if the data normally coming into the method is text you could try and match the argument and see if it is any faster. Exapmle: optimization before: >>>> import Org.apache.hadoop.hive.ql.exec.udf;>>>> import java.net.urldecoder;>>>>> >>> public final class urldecode extends UDF {>>>>> >>> &nbSp;public string evaluate (final string s) {>>>> if (s == null) { return null; }>>>> return getstring (s);>>>> }>>>>>>>> public static string getstring ( String s) {>>>> String a;>> >> try {>>>> a = urldecoder.decode (s);>>>> } catch ( exception e) {>>>> a = "";>>>> }>>>> return a;>>>> }>>>>>>>> public static void Main (string args[]) {>>>> string t = "%e5%a4%aa%e5%8e%9f-%e4%b8%89%e4%ba%9a";>>>> system.out.println ( getstring (t) );>>>> }>>> > }
after optimization:
import java.net.urldecoder;public final class urldecode extends udf { private text t = new text (); public Text evaluate (text s) { if (s == NULL) { return null; } try { t.set ( urldecoder.decode ( S.tostring (), "UTF-8" ); return t; } catch ( Exception e) { return null; } } //public static void main (string args[]) { //String t = "%e5%a4%aa%e5%8e%9f-%e4%b8% 89%e4%ba%9a "; //system.out.println ( getString (t) ); //}}
3 Inheritance Implementation GENERICUDF
3 , if it is Hive 0.14 + The hive.cache.expr.evaluation UDF cache function can be turned on
Hive job SQL Optimizer CPU occupies too high