For 1 GB log files, you need to find the row and row location of the specified string. There are two methods: one is to directly use java functions, and the other is to call the Linux shell Command to assist in processing. The following is an example program:
# Cat TestIO. java
import java.io.BufferedInputStream;import java.io.BufferedReader;import java.io.File;import java.io.FileInputStream;import java.io.InputStreamReader;import java.util.regex.Pattern;public class TestIO{ private int lineNum = 0; private String path = ""; private String searchStr = ""; public void setPath(String value) { path = value; } public String getPath() { return path; } public void setSearchStr(String value) { searchStr = value; } public String getSearchStr() { return searchStr; } /** * Java search by index */ public void start() { if(null == path || path.length()<1) return; try { long startMili=System.currentTimeMillis(); System.out.println("Start search \""+searchStr+"\" in file: "+path); File file = new File(path); BufferedInputStream fis = new BufferedInputStream(new FileInputStream(file)); BufferedReader reader = new BufferedReader(new InputStreamReader(fis,"utf-8")); String line = ""; lineNum = 0; while((line = reader.readLine()) != null) { lineNum ++; String rs = this.searchStr(line, searchStr); if(rs.length()>0) { // System.out.println("Find in Line["+lineNum+"], index: "+rs); } } System.out.println("Finished!"); long endMili=System.currentTimeMillis(); System.out.println("Total times: "+(endMili-startMili)+" ms"); System.out.println(""); } catch(Exception e) { e.printStackTrace(); } } /** * Call shell command to search */ public void startByShell() { try { long startMili=System.currentTimeMillis(); System.out.println("Start search \""+searchStr+"\" in file: "+path+ " by shell"); String[] cmd = {"/bin/sh", "-c", "grep "+searchStr+" "+path+" -n "}; Runtime run = Runtime.getRuntime(); Process p = run.exec(cmd); BufferedInputStream in = new BufferedInputStream(p.getInputStream()); BufferedReader reader = new BufferedReader(new InputStreamReader(in)); String line = ""; lineNum = 0; while((line = reader.readLine()) != null) { lineNum ++; String rs = this.searchStr(line.substring(line.indexOf(':')+1), searchStr); if(rs.length()>0) { String linebyshell = line.substring(0, line.indexOf(':')); //System.out.println("Find in Line["+linebyshell+"], index: "+rs); } } System.out.println("Finished!"); long endMili=System.currentTimeMillis(); System.out.println("Total times: "+(endMili-startMili)+" ms"); System.out.println(""); } catch(Exception e) { e.printStackTrace(); } } public String searchStr(String src, String value) { String result = ""; int index = src.indexOf(value,0); while(index>-1) { result+=index+","; index = src.indexOf(value,index+value.length()); } return result; } public static boolean isNumeric(String str) { Pattern pattern = Pattern.compile("[0-9]*"); return pattern.matcher(str).matches(); } /** * @param args */ public static void main(String[] args) { String file = "./testfile.txt"; TestIO test = new TestIO(); if(args.length>0) test.setPath(args[0]); else test.setPath(file); if(args.length>1) test.setSearchStr(args[1]); else test.setSearchStr("hello"); test.start(); test.startByShell(); }}
The test file contains 1.4 GB logs and millions of logs. Where
The keyword hello has less than 50 records;
Chipkill accounts for about 20% of records;
Error accounts for about 50% of records;
Mainbuild166 accounts for about 99% of records;
Test results:
[Root @ mainbuild166 io] # java TestIO./testfile.txt hello
Start search "hello" in file:./testfile.txt
Finished!
Total times: 7825 MS
Start search "hello" in file:./testfile.txt by shell
Finished!
Total times: 3080 MS
[Root @ mainbuild166 io] # java TestIO./testfile.txt chipkill
Start search "chipkill" in file:./testfile.txt
Finished!
Total times: 8760 MS
Start search "chipkill" in file:./testfile.txt by shell
Finished!
Total times: 3732 MS
[Root @ mainbuild166 io] # java TestIO./testfile.txt error
Start search "error" in file:./testfile.txt
Finished!
Total times: 11339 MS
Start search "error" in file:./testfile.txt by shell
Finished!
Total times: 8163 MS
[Root @ mainbuild166 io] # java TestIO./testfile.txt mainbuild166
Start search "mainbuild166" in file:./testfile.txt
Finished!
Total times: 9938 MS
Start search "mainbuild166" in file:./testfile.txt by shell
Finished!
Total times: 12531 MS
From the above test results, it can be seen that when the result set is much smaller than the data set, the method of calling shell is far more efficient than using java functions directly, which is quite in line with the actual situation.
This article is from the "little he Beibei's technical space" blog, please be sure to keep this http://babyhe.blog.51cto.com/1104064/1358167