標籤:ar os sp for on 檔案 代碼 ad 時間
參考在網上找到的代碼,沒想到相差那麼大,目前有個項目要對50GB~70GB的代碼,260個關鍵字做掃描,急需一個比較快速的方案。
[[email protected] stat]$ wc -l 1 234033 1[[email protected] stat]$ perl 1.pl cost 1 secondszjtel : 32606[[email protected] stat]$ perl 2.pl cost 111 secondszjtel : 32606
1.pl
#!/usr/bin/perlmy $time=time();open(file,"1");while(<file>;){ chomp; if(m/:zjtel:/) { $zjtel++; }}close(file);$time=time()-$time;print "cost $time seconds\n";print "zjtel : $zjtel\n";
2.pl
#!/usr/bin/perl$time=time();$count=`grep zjtel 1 | wc -l `;$time=time()-$time;print "cost $time seconds\n";print "zjtel : $count\n"
我的等待測試代碼:
pattern-match:
use strict;use File::Basename;//在一個目錄的檔案檔案中尋找包含關鍵字的 <檔案名稱>:<行數>:<行內容>my ($dir,$keywords)= @ARGV;opendir(DIRHANDLE,$dir) or die "Can‘t open $dir:$!";my @filenames=sort readdir(DIRHANDLE);close(DIRHANDLE);open KEY,"<$keywords" or die "Can‘t open $keywords";my @keywords=<KEY>;close KEY;my $num_key=scalar @keywords;my @match_lines;my $time=time();foreach my $file(@filenames){ open FILE,"<$file"; $n=1; while my $line(<FILE>){ chomp $line; foreach my $key(@keywords){ if($line=~m/$key/){ $context="$file:$n:$line\n"; push @match_lines,$context; } } } close(file);}open RS,">result_file_pattern";foreach(@match_lines){ print RS $_;}close RS;$time=time()-$time;print "Patter-match ($num_key keywords) end:$time seconds\n";//如果直接將$context print到RS控制代碼和現在這種方式是否有區別?
grep:
use strict;use File::Basename;//在一個目錄的檔案檔案中尋找包含關鍵字的 <檔案名稱>:<行數>:<行內容>my ($dir,$keywords)= @ARGV;opendir(DIRHANDLE,$dir) or die "Can‘t open $dir:$!";my @filenames=sort readdir(DIRHANDLE);close(DIRHANDLE);open KEY,"<$keywords" or die "Can‘t open $keywords";my @keywords=<KEY>;close KEY;my $num_key=scalar @keywords;my @match_lines;my $time1=time();foreach my $file(@filenames){ foreach $key(@keywords){ chomp $key; my @sub_match_lines=`grep $key $file`; push @match_lines,@sub_match_lines; }}open RS,">result_file_grep";foreach(@match_lines){ print RS $_;}close RS;my $time2=time();print "Grep ($num_key keywords) end:",$time2-$time1,"\n";//如果直接將$context print到RS控制代碼和現在這種方式是否有區別?
【linux】grep 和【perl】 指令碼實現的grep功能的已耗用時間差異