本貼對三種遍曆檔案夾方法比較。
1. 使用File::Find;
2. 遞迴遍曆。(遍曆函數為lsr)
3. 使用隊列或棧遍曆。(遍曆函數為lsr_s)
1.use File::Find
[Copy to clipboard] [ - ]
CODE:
#!/usr/bin/perl -W
#
# File: find.pl
# Author: 路小佳
# License: GPL-2
use strict;
use warnings;
use File::Find;
my ($size, $dircnt, $filecnt) = (0, 0, 0);
sub process {
my $file = $File::Find::name;
#print $file, "/n";
if (-d $file) {
$dircnt++;
}
else {
$filecnt++;
$size += -s $file;
}
}
find(/&process, '.');
print "$filecnt files, $dircnt directory. $size bytes./n";
2. lsr遞迴遍曆
[Copy to clipboard] [ - ]
CODE:
#!/usr/bin/perl -W
#
# File: lsr.pl
# Author: 路小佳
# License: GPL-2
use strict;
use warnings;
sub lsr($) {
sub lsr;
my $cwd = shift;
local *DH;
if (!opendir(DH, $cwd)) {
warn "Cannot opendir $cwd: $! $^E";
return undef;
}
foreach (readdir(DH)) {
if ($_ eq '.' || $_ eq '..') {
next;
}
my $file = $cwd.'/'.$_;
if (!-l $file && -d _) {
$file .= '/';
lsr($file);
}
process($file, $cwd);
}
closedir(DH);
}
my ($size, $dircnt, $filecnt) = (0, 0, 0);
sub process($$) {
my $file = shift;
#print $file, "/n";
if (substr($file, length($file)-1, 1) eq '/') {
$dircnt++;
}
else {
$filecnt++;
$size += -s $file;
}
}
lsr('.');
print "$filecnt files, $dircnt directory. $size bytes./n";
3. lsr_s棧遍曆
[Copy to clipboard] [ - ]
CODE:
#!/usr/bin/perl -W
#
# File: lsr_s.pl
# Author: 路小佳
# License: GPL-2
use strict;
use warnings;
sub lsr_s($) {
my $cwd = shift;
my @dirs = ($cwd.'/');
my ($dir, $file);
while ($dir = pop(@dirs)) {
local *DH;
if (!opendir(DH, $dir)) {
warn "Cannot opendir $dir: $! $^E";
next;
}
foreach (readdir(DH)) {
if ($_ eq '.' || $_ eq '..') {
next;
}
$file = $dir.$_;
if (!-l $file && -d _) {
$file .= '/';
push(@dirs, $file);
}
process($file, $dir);
}
closedir(DH);
}
}
my ($size, $dircnt, $filecnt) = (0, 0, 0);
sub process($$) {
my $file = shift;
print $file, "/n";
if (substr($file, length($file)-1, 1) eq '/') {
$dircnt++;
}
else {
$filecnt++;
$size += -s $file;
}
}
lsr_s('.');
print "$filecnt files, $dircnt directory. $size bytes./n";
對我的硬碟/dev/hda6的測試結果。
1: File::Find
[Copy to clipboard] [ - ]
CODE:
26881 files, 1603 directory. 9052479946 bytes.
real 0m9.140s
user 0m3.124s
sys 0m5.811s
2: lsr
[Copy to clipboard] [ - ]
CODE:
26881 files, 1603 directory. 9052479946 bytes.
real 0m8.266s
user 0m2.686s
sys 0m5.405s
3: lsr_s
[Copy to clipboard] [ - ]
CODE:
26881 files, 1603 directory. 9052479946 bytes.
real 0m6.532s
user 0m2.124s
sys 0m3.952s
測試時考慮到cache所以要多測幾次取平均, 也不要同時列印檔案名, 因為控制台是慢裝置, 會形成瓶頸。
lsr_s之所以用棧而不是隊列來遍曆,是因為Perl的push shift pop操作是基於數組的, push pop這樣成對操作可能有最佳化。記憶體和cpu佔用大小順序也是1>2>3.
[Copy to clipboard] [ - ]
CODE:
CPU load memory
use File::Find 97% 4540K
lsr 95% 3760K
lsr_s 95% 3590K
結論: 強烈推薦使用lsr_s來遍曆檔案夾。
=============再羅嗦幾句======================
從執行效率上來看,find.pl比lsr.pl的差距主要在user上, 原因是File::Find模組選項較多, 條件判斷費時較多,而lsr_s.pl比lsr.pl在作系統調用用時較少, 是因為遞迴時程式還在儲存原有的檔案控制代碼和函數恢複現場的資訊, 所以sys費時較多。 所以lsr_s在sys與user上同時勝出是不無道理的。