How to clean the data in log analysis when we are in the log analysis, then the log data is disorganized, or the log data is not what we want to see. So we need to clean the data inside, and to be blunt is to filter the strings inside. Here is the original data we need to filter: 183.131.11.98--[01/aug/2014:01:01:05 +0800] "get/thread-5981-1-1.html http/1.1"// Www.baidu.com/s?wd=cocos2dx%203.2%20wp8%E6%94%AF%E6%8C%81&pn=30&oq=cocos2dx%203.2%20wp8%E6%94%AF%E6%8C %81&tn=28035039_2_pg&ie=utf-8&rsv_page=1 "" mozilla/5.0 (Windows NT 6.3; WOW64) applewebkit/537.36 (khtml, like Gecko) chrome/32.0.1700.107 ubrowser/1.0.349.1252 safari/537.36 "as needed, We need to filter to extract the following data: 1.ip address 2. Access time 3.url address 4. User use browser task decomposition 1, IP address to obtain the above IP address is better than filter, the delimiter is--can get the data we want: Ipfield = Line.split ("--" ) [0].trim (); 2, access time to get access time, want to get time easy, but want to do a literary programmer still have to pay a bit of kung fu. [01/aug/2014:01:01:05 +0800], for the use of direct access to 01/aug/2014:01:01:05 this way, this way is not wrong, as a normal programmer to do things. So what do we do with the art of elegance a little. Here is the direct fetch of 01/aug/2014:01:01:05 +0800, here is the relevant function: dt = new SimpleDateFormat ("Dd/mmm/yyyy:hh:mm:ss Z", locale.us). Parse ( We use this function to convert it to a normal time format. But we want our Chinese to be able to recognize the normal time at a glance. August 1, 2014 07:04 P.M. 58 seconds If you take this form 20140801070458, this is not a literary programmer, is not a normal programmer to do things, there is only 2B programmer this job title. OK, below we do a bit of literature. But how can we get the following time, a combination, Getyarn () +getmonth ... Wait, finish, and step into the ranks of 2B programmers. August 1, 2014 07:04 P.M. 58 sec Here is an easy way to do this: DateFormat df1 = dateformat.getdatetimeinstance (Dateformat.long,dateformat.long); Datefield = Df1.format (DT); This solves this problem perfectly, does not need the combination, only needs the getdatetimeinstance to pass the parameter. 3, browser and URL The key is to understand the escape character is correct, such as how to use double quotation marks as delimiters, how to use parentheses as delimiters: Copy code package Www.fuyunnet.com;import Java.text.DateFormat; Import Java.text.parseexception;import java.text.simpledateformat;import java.util.date;import Java.util.Locale; public class Test {public static void Stringresolves (string line) throws parseexception {string IP Field, Datefield, Urlfield, Browserfield; Get the IP address Ipfield = Line.split ("--") [0].trim (); Get the time, and convert the format int gettimefirst = Line.indexof ("["); int gettimelast = Line.indexof ("]"); String time = line.substring (Gettimefirst + 1, gettimelast). Trim (); Date dt = null; DateFormat df1 = dateformat.getdatetimeinstance (Dateformat.long, Dateformat.long); DT = new SimpleDateFormat ("Dd/mmm/yyyy:hh:mm:ss Z", locale.us). Parse (time); Datefield = Df1.format (DT); Get URL string[] getUrl = line.split ("\" "); String Firtgeturl = geturl[1].substring (3). Trim (); String Secondgeturl = Geturl[3].trim (); Urlfield = Firtgeturl + "delimiter" + secondgeturl; Get browser string[] Getbrowse = line.split ("\" "); String Strbrowse = getbrowse[5].tostring (); String str = "(khtml, like Gecko)"; int i = Strbrowse.indexof (str); Strbrowse = strbrowse.substring (i); String strbrowse1[] = Strbrowse.split ("\\/"); Strbrowse = Strbrowse1[0].tostring (); String sTrbrowse2[] = strbrowse.split ("\ \)"); Strbrowse = Strbrowse2[1].trim (); System.out.println (Ipfield); System.out.println (Datefield); System.out.println (Urlfield); System.out.println (Strbrowse); } public static void Main (string[] args) throws ParseException {//TODO auto-generated method stub
String browser = "203.100.80.88--[01/aug/2014:19:04:58 +0800] \" Get/uc_server/avatar.php?uid=3841&size=small http/1.1\ "301 463 \" Http://www.aboutyun
. com/forum.php\ "\" mozilla/5.0 (Windows NT 6.2; WOW64) applewebkit/537.36 (khtml, like Gecko) chrome/28.0.1500.95 safari/537.36 SE 2.X METASR 1.0 "; Test. Stringresolves (browser); }}
How to perform data cleansing in log analysis