Refer to the code found on the Internet, did not expect the difference is so big, there is a project to 50GB~70GB code, 260 keywords to do scanning, need a relatively fast solution.
[gzhy@nearby stat]$ wc -l 1
234033 1
[gzhy@nearby stat]$ perl 1.pl
cost 1 seconds
zjtel : 32606
[gzhy@nearby stat]$ perl 2.pl
cost 111 seconds
zjtel : 32606
1.pl
#!/usr/bin/perl
my $time=time();
open(file,"1");
while(<file>;)
{
chomp;
if(m/:zjtel:/)
{
$zjtel++;
}
}
close(file);
$time=time()-$time;
print "cost $time seconds\n";
print "zjtel : $zjtel\n";
2.pl
#!/usr/bin/perl
$time=time();
$count=`grep zjtel 1 | wc -l `;
$time=time()-$time;
print "cost $time seconds\n";
print "zjtel : $count\n"
My wait for the test code:
Pattern-match:
Use strict;
Use File::Basename;
/ / Find the <file name> containing the keyword in the file file of a directory: <number of lines>: <line content>
My ($dir,$keywords)= @ARGV;
Opendir(DIRHANDLE,$dir) or die "Can‘t open $dir:$!";
My @filenames=sort readdir(DIRHANDLE);
Close(DIRHANDLE);
Open KEY,"<$keywords" or die "Can't open $keywords";
My @keywords=<KEY>;
Close KEY;
My $num_key=scalar @keywords;
My @match_lines;
My $time=time();
Foreach my $file(@filenames){
Open FILE,"<$file";
$n=1;
While my $line(<FILE>){
Chomp $line;
Foreach my $key(@keywords){
If($line=~m/$key/){
$context="$file:$n:$line\n";
Push @match_lines, $context;
}
}
}
Close(file);
}
Open RS,">result_file_pattern";
Foreach(@match_lines){
Print RS $_;
}
Close RS;
$time=time()-$time;
Print "Patter-match ($num_key keywords) end:$time seconds\n";
/ / If you directly print $context to the RS handle and now this is the difference?
Grep:
Use strict;
Use File::Basename;
/ / Find the <file name> containing the keyword in the file file of a directory: <number of lines>: <line content>
My ($dir,$keywords)= @ARGV;
Opendir(DIRHANDLE,$dir) or die "Can‘t open $dir:$!";
My @filenames=sort readdir(DIRHANDLE);
Close(DIRHANDLE);
Open KEY,"<$keywords" or die "Can't open $keywords";
My @keywords=<KEY>;
Close KEY;
My $num_key=scalar @keywords;
My @match_lines;
My $time1=time();
Foreach my $file(@filenames){
Foreach $key(@keywords){
Chomp $key;
My @sub_match_lines=`grep $key $file`;
Push @match_lines, @sub_match_lines;
}
}
Open RS,">result_file_grep";
Foreach(@match_lines){
Print RS $_;
}
Close RS;
My $time2=time();
Print "Grep ($num_key keywords) end:",$time2-$time1,"\n";
/ / If you directly print $context to the RS handle and now this is the difference?
Run-time differences in grep functionality implemented by "Linux" grep and "Perl" scripts