Mapreuduce Realization of network data packet cleaning work

Source: Internet
Author: User
Tags iterable

The processed data can be placed directly into hive or the MapReduce program to count the information of the network data stream, such as the current implementation of a relatively simple HTTP GET request statistics

First MapReduce: Extract the time, hex header information, and put it on one line (here is the special handling of multiple lines of the key-value pairs of mapreduce, which is a notable place)

There are two main problems encountered:

A packet contains the time, Baotou's simple information, Baotou's details, the original intention is to put a packet of time, packet hex details (exist in many lines) in order to put into a row, in Java, read by line, very good implementation.

There are two ways to solve the characteristics of the key-value pair processing for MapReduce:

(1) The key value of the time, the information of a packet key value and the same

But Mr's map processes only one line of information at a time, and reduce only handles rows with the same key, and there is a shuffle, sort phase from the map phase to the reduce, or perhaps because the machine that is near the reduce is processed and sent directly to the reduce , first to first processing), the value of the same key is in a disorderly order.

(2) All key values are incremented

This does not have the same key value and cannot be placed on a line

The Final Solution:

(3) The key value of the time, the same package of information is the same as the key value, but in the hexadecimal line to add an incrementing ID, placed in a row, although the order is disorderly, but with the ID, it is good to rearrange it, wonderful!

The second mapreduce: sorting hex information is the first MapReduce supplement, at this point, the cleaning work is complete, you can count the hexadecimal in any position to analyze the data

Third MapReduce: Count the number of GET requests sent by HTTP

static int id=1;static int hexid=1;  public static class Tokenizermapper extends Mapper<object, Text, intwritable, text> {private final static    Intwritable one = new intwritable (2);          Private text Word = new text ();     public void Map (Object key, Text value, Context context) throws IOException, Interruptedexception { Match time String regextime = "([0-2][0-4]):([0-5][0-9]):([0-5][0-9]). [0-9] {6} ";//11:08:56.149361pattern Patterntime = Pattern.compile (regextime); Matcher matchtime = Patterntime.matcher (value.tostring ()), while (Matchtime.find ()) {String time = "Time:" + Matchtime.group () + ""; Id=id+1;word.set (time); One.set (ID); Context.write (one, word);} Match hex//string Regexhex = "0x[0-9]{4}: ([a-za-z0-9]{4}) +"; String Regexhex = "([a-za-z0-9]{4}) +"; Pattern Patternhex = Pattern.compile (Regexhex); Matcher Matchhex = Patternhex.matcher (value.tostring ()), while (Matchhex.find ()) {String hex = "" + Matchhex.group (); hexid=hexid+1; hex= "ID:" +STRING.VAlueof (HEXID) + "" "+hex;word.set (hex); one.set (ID); Context.write (one, Word);}}}  public static class Intsumreducer extends Reducer<intwritable,text,intwritable,text> {private Text result    = new Text ();                       public void reduce (intwritable key, iterable<text> values, context context      ) throws IOException, interruptedexception {String sum = "";         for (Text val:values) {sum + = val.tostring ();      } result.set (sum);    Context.write (key, result); }  }

public static class Tokenizermapper extends Mapper<object, text, text, text> {Private final static text O    NE = new Text ();          Private text Word = new text ();     public void Map (Object key, Text value, Context context) throws IOException, Interruptedexception { The matching time String regextime = "timing: ([0-2][0-4]):([0-5][0-9]):([0-5][0-9]). [0-9] {6} ";//11:08:56.149361pattern Patterntime = Pattern.compile (regextime); Matcher matchtime = Patterntime.matcher (value.tostring ()), while (Matchtime.find ()) {//string time = "Time:" + Matchtime.group () + ""; String temptime =matchtime.group (); String Time =temptime.substring (6, Temptime.length ()-1); One.set (time);} Sort hexadecimal//string Regexhex = "0x[0-9]{4}: ([a-za-z0-9]{4}) +"; list<bar> list = new arraylist<bar> (); String Regexhex = "ID: ([0-9]) + ([a-za-z0-9]{4}) +"; Pattern Patternhex = Pattern.compile (Regexhex); Matcher Matchhex = Patternhex.matcher (value.tostring ()), while (Matchhex.find ()) {Bar bar = nEW Bar (); String hexline = Matchhex.group (); String REGEXHEX2 = "ID: ([0-9]) +"; One line of 16 binary ordinal Pattern patternHex2 = Pattern.compile (REGEXHEX2); Matcher matchHex2 = Patternhex2.matcher (Hexline), while (Matchhex2.find ()) {String lineid=matchhex2.group (). ToString ( ). substring (3); Bar.setid (lineId);} String regexHex3 = "([a-za-z0-9]{4}) +"; One line of hex Pattern patternHex3 = Pattern.compile (REGEXHEX3); Matcher matchHex3 = Patternhex3.matcher (Hexline), while (Matchhex3.find ()) {String linehex= matchhex3.group (). toString (); Bar.sethexvalue (Linehex);} List.add (bar);} StringBuffer buffer = new StringBuffer (""); Collections.sort (list); for (int i=0;i<list.size (); i++) {Bar bar=list.get (i); String Linehex=bar.gethexvalue (); Buffer.append (Linehex);}    String hexone= buffer.tostring (); Word.set (Hexone); Context.write (one, word);  }} public static class Intsumreducer extends Reducer<text,text,text,text> {private Text result = new    Text ();        public void reduce (Text key, iterable<text> values,                Context context) throws IOException, interruptedexception {String sum = "";         for (Text val:values) {context.write (key, Val); }    }  }

  

public static class Tokenizermapper Extendsmapper<object, text, text, intwritable> {private final static intwritabl e one = new intwritable (1);p rivate Text word = new text ("Sumget");p ublic void map (Object key, text value, context context) Throws IOException, interruptedexception {int timelen=15;int getlen=20*5+timelen; String strline=value.tostring (); if (Strline.length () > Getlen) {//| | | Hexvalue[20].equals ("4854") String getpos=strline.substring (timelen+20*5,timelen+21*5-1); if (Getpos.equals ("4745")) {Context.write (Word, one);}}}}  public static class Intsumreducer extends Reducer<text, intwritable, Text, intwritable> {private intwritable result  = new Intwritable ();p ublic void reduce (Text key, iterable<intwritable> values, context context) throws IOException, interruptedexception {int sum =0;for (intwritable val:values) {sum+=val.get ();} Result.set (sum); Context.write (key, result);}}

  

Mapreuduce Realization of network data packet cleaning work

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.