How to perform data cleansing in log analysis

Source: Internet
Author: User
Tags dateformat

 How to clean the data in log analysis when we are in the log analysis, then the log data is disorganized, or the log data is not what we want to see. So we need to clean the data inside, and to be blunt is to filter the strings inside. Here is the original data we need to filter: 183.131.11.98--[01/aug/2014:01:01:05 +0800] "get/thread-5981-1-1.html http/1.1"// Www.baidu.com/s?wd=cocos2dx%203.2%20wp8%E6%94%AF%E6%8C%81&pn=30&oq=cocos2dx%203.2%20wp8%E6%94%AF%E6%8C %81&tn=28035039_2_pg&ie=utf-8&rsv_page=1 "" mozilla/5.0 (Windows NT 6.3; WOW64) applewebkit/537.36 (khtml, like Gecko) chrome/32.0.1700.107 ubrowser/1.0.349.1252 safari/537.36 "as needed, We need to filter to extract the following data: 1.ip address 2. Access time 3.url address 4. User use browser task decomposition 1, IP address to obtain the above IP address is better than filter, the delimiter is--can get the data we want: Ipfield = Line.split ("--" ) [0].trim (); 2, access time to get access time, want to get time easy, but want to do a literary programmer still have to pay a bit of kung fu. [01/aug/2014:01:01:05 +0800], for the use of direct access to 01/aug/2014:01:01:05 this way, this way is not wrong, as a normal programmer to do things. So what do we do with the art of elegance a little. Here is the direct fetch of 01/aug/2014:01:01:05 +0800, here is the relevant function: dt = new SimpleDateFormat ("Dd/mmm/yyyy:hh:mm:ss Z", locale.us). Parse ( We use this function to convert it to a normal time format. But we want our Chinese to be able to recognize the normal time at a glance. August 1, 2014 07:04 P.M. 58 seconds If you take this form 20140801070458, this is not a literary programmer, is not a normal programmer to do things, there is only 2B programmer this job title. OK, below we do a bit of literature. But how can we get the following time, a combination, Getyarn () +getmonth ... Wait, finish, and step into the ranks of 2B programmers. August 1, 2014 07:04 P.M. 58 sec Here is an easy way to do this: DateFormat df1 = dateformat.getdatetimeinstance (Dateformat.long,dateformat.long); Datefield = Df1.format (DT); This solves this problem perfectly, does not need the combination, only needs the getdatetimeinstance to pass the parameter. 3, browser and URL The key is to understand the escape character is correct, such as how to use double quotation marks as delimiters, how to use parentheses as delimiters: Copy code package Www.fuyunnet.com;import Java.text.DateFormat; Import Java.text.parseexception;import java.text.simpledateformat;import java.util.date;import Java.util.Locale; public class Test {public static void Stringresolves (string line) throws parseexception {string IP                Field, Datefield, Urlfield, Browserfield;                Get the IP address Ipfield = Line.split ("--") [0].trim ();                Get the time, and convert the format int gettimefirst = Line.indexof ("[");                int gettimelast = Line.indexof ("]");       String time = line.substring (Gettimefirst + 1, gettimelast). Trim ();         Date dt = null;                DateFormat df1 = dateformat.getdatetimeinstance (Dateformat.long, Dateformat.long);                DT = new SimpleDateFormat ("Dd/mmm/yyyy:hh:mm:ss Z", locale.us). Parse (time);                Datefield = Df1.format (DT);                Get URL string[] getUrl = line.split ("\" ");                String Firtgeturl = geturl[1].substring (3). Trim ();                String Secondgeturl = Geturl[3].trim ();                Urlfield = Firtgeturl + "delimiter" + secondgeturl;                Get browser string[] Getbrowse = line.split ("\" ");                String Strbrowse = getbrowse[5].tostring ();                String str = "(khtml, like Gecko)";                int i = Strbrowse.indexof (str);                Strbrowse = strbrowse.substring (i);                String strbrowse1[] = Strbrowse.split ("\\/");                Strbrowse = Strbrowse1[0].tostring (); String sTrbrowse2[] = strbrowse.split ("\ \)");                Strbrowse = Strbrowse2[1].trim ();                System.out.println (Ipfield);                System.out.println (Datefield);                System.out.println (Urlfield);        System.out.println (Strbrowse); } public static void Main (string[] args) throws ParseException {//TODO auto-generated method stub
String browser = "203.100.80.88--[01/aug/2014:19:04:58 +0800] \" Get/uc_server/avatar.php?uid=3841&size=small http/1.1\ "301 463 \" Http://www.aboutyun
. com/forum.php\ "\" mozilla/5.0 (Windows NT 6.2; WOW64) applewebkit/537.36 (khtml, like Gecko) chrome/28.0.1500.95 safari/537.36 SE 2.X METASR 1.0 "; Test. Stringresolves (browser); }}

How to perform data cleansing in log analysis

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.