The processed data can be placed directly into hive or the MapReduce program to count the information of the network data stream, such as the current implementation of a relatively simple HTTP GET request statistics
First MapReduce: Extract the time, hex header information, and put it on one line (here is the special handling of multiple lines of the key-value pairs of mapreduce, which is a notable place)
There are two main problems encountered:
A packet contains the time, Baotou's simple information, Baotou's details, the original intention is to put a packet of time, packet hex details (exist in many lines) in order to put into a row, in Java, read by line, very good implementation.
There are two ways to solve the characteristics of the key-value pair processing for MapReduce:
(1) The key value of the time, the information of a packet key value and the same
But Mr's map processes only one line of information at a time, and reduce only handles rows with the same key, and there is a shuffle, sort phase from the map phase to the reduce, or perhaps because the machine that is near the reduce is processed and sent directly to the reduce , first to first processing), the value of the same key is in a disorderly order.
(2) All key values are incremented
This does not have the same key value and cannot be placed on a line
The Final Solution:
(3) The key value of the time, the same package of information is the same as the key value, but in the hexadecimal line to add an incrementing ID, placed in a row, although the order is disorderly, but with the ID, it is good to rearrange it, wonderful!
The second mapreduce: sorting hex information is the first MapReduce supplement, at this point, the cleaning work is complete, you can count the hexadecimal in any position to analyze the data
Third MapReduce: Count the number of GET requests sent by HTTP
static int id=1;static int hexid=1; public static class Tokenizermapper extends Mapper<object, Text, intwritable, text> {private final static Intwritable one = new intwritable (2); Private text Word = new text (); public void Map (Object key, Text value, Context context) throws IOException, Interruptedexception { Match time String regextime = "([0-2][0-4]):([0-5][0-9]):([0-5][0-9]). [0-9] {6} ";//11:08:56.149361pattern Patterntime = Pattern.compile (regextime); Matcher matchtime = Patterntime.matcher (value.tostring ()), while (Matchtime.find ()) {String time = "Time:" + Matchtime.group () + ""; Id=id+1;word.set (time); One.set (ID); Context.write (one, word);} Match hex//string Regexhex = "0x[0-9]{4}: ([a-za-z0-9]{4}) +"; String Regexhex = "([a-za-z0-9]{4}) +"; Pattern Patternhex = Pattern.compile (Regexhex); Matcher Matchhex = Patternhex.matcher (value.tostring ()), while (Matchhex.find ()) {String hex = "" + Matchhex.group (); hexid=hexid+1; hex= "ID:" +STRING.VAlueof (HEXID) + "" "+hex;word.set (hex); one.set (ID); Context.write (one, Word);}}} public static class Intsumreducer extends Reducer<intwritable,text,intwritable,text> {private Text result = new Text (); public void reduce (intwritable key, iterable<text> values, context context ) throws IOException, interruptedexception {String sum = ""; for (Text val:values) {sum + = val.tostring (); } result.set (sum); Context.write (key, result); } }
public static class Tokenizermapper extends Mapper<object, text, text, text> {Private final static text O NE = new Text (); Private text Word = new text (); public void Map (Object key, Text value, Context context) throws IOException, Interruptedexception { The matching time String regextime = "timing: ([0-2][0-4]):([0-5][0-9]):([0-5][0-9]). [0-9] {6} ";//11:08:56.149361pattern Patterntime = Pattern.compile (regextime); Matcher matchtime = Patterntime.matcher (value.tostring ()), while (Matchtime.find ()) {//string time = "Time:" + Matchtime.group () + ""; String temptime =matchtime.group (); String Time =temptime.substring (6, Temptime.length ()-1); One.set (time);} Sort hexadecimal//string Regexhex = "0x[0-9]{4}: ([a-za-z0-9]{4}) +"; list<bar> list = new arraylist<bar> (); String Regexhex = "ID: ([0-9]) + ([a-za-z0-9]{4}) +"; Pattern Patternhex = Pattern.compile (Regexhex); Matcher Matchhex = Patternhex.matcher (value.tostring ()), while (Matchhex.find ()) {Bar bar = nEW Bar (); String hexline = Matchhex.group (); String REGEXHEX2 = "ID: ([0-9]) +"; One line of 16 binary ordinal Pattern patternHex2 = Pattern.compile (REGEXHEX2); Matcher matchHex2 = Patternhex2.matcher (Hexline), while (Matchhex2.find ()) {String lineid=matchhex2.group (). ToString ( ). substring (3); Bar.setid (lineId);} String regexHex3 = "([a-za-z0-9]{4}) +"; One line of hex Pattern patternHex3 = Pattern.compile (REGEXHEX3); Matcher matchHex3 = Patternhex3.matcher (Hexline), while (Matchhex3.find ()) {String linehex= matchhex3.group (). toString (); Bar.sethexvalue (Linehex);} List.add (bar);} StringBuffer buffer = new StringBuffer (""); Collections.sort (list); for (int i=0;i<list.size (); i++) {Bar bar=list.get (i); String Linehex=bar.gethexvalue (); Buffer.append (Linehex);} String hexone= buffer.tostring (); Word.set (Hexone); Context.write (one, word); }} public static class Intsumreducer extends Reducer<text,text,text,text> {private Text result = new Text (); public void reduce (Text key, iterable<text> values, Context context) throws IOException, interruptedexception {String sum = ""; for (Text val:values) {context.write (key, Val); } } }
public static class Tokenizermapper Extendsmapper<object, text, text, intwritable> {private final static intwritabl e one = new intwritable (1);p rivate Text word = new text ("Sumget");p ublic void map (Object key, text value, context context) Throws IOException, interruptedexception {int timelen=15;int getlen=20*5+timelen; String strline=value.tostring (); if (Strline.length () > Getlen) {//| | | Hexvalue[20].equals ("4854") String getpos=strline.substring (timelen+20*5,timelen+21*5-1); if (Getpos.equals ("4745")) {Context.write (Word, one);}}}} public static class Intsumreducer extends Reducer<text, intwritable, Text, intwritable> {private intwritable result = new Intwritable ();p ublic void reduce (Text key, iterable<intwritable> values, context context) throws IOException, interruptedexception {int sum =0;for (intwritable val:values) {sum+=val.get ();} Result.set (sum); Context.write (key, result);}}
Mapreuduce Realization of network data packet cleaning work