1. Introduction
Retrieve various types of data from the original data file, remove the header, and output it to different files named according to the data type. This sorting and formatting task is a basic part of data analysis and processing. You can use awk, grep/SED/awk combination, or Perl to implement this task.
To compare the performance of these three implementation methods, we chose the same data for testing, programming in three ways respectively to implement the same function and compare its running time performance.
The tools and operating system versions we use are as follows:
Grep (GNU grep) 2.5.1
Sed GNU sed version 4.1.2
Awk GNU awk 3.1.3
Perl this is Perl, v5.8.5 built for i386-linux-thread-multi
Linux version 2.6.9-42. elsmp (bhcompile@hs20-bc1-1.build.redhat.com) (GCC version 3.4.6 20060404 (Red Hat 3.4.6-2) #1 SMP Wed Jul 12 23:27:17 EDT 2006
2. algorithm Overview
In the three methods, awk and Perl are used for one scan of all files, while grep/SED/awk is used for multiple scans.
Awk uses its own field splitting function, and uses its system function gsub to implement the header removal function.
In grep/SED/awk mode, grep, sed, and awk are used to complete the tasks they are good at. grep performs string SEARCH, sed performs text replacement, and awk performs Field Segmentation and output.
In Perl, the split function is used to perform two-step cutting, one-time cutting of fields (Tab Division) and one-time cutting of headers and data, output Data to different files based on the DT header field.
3. Running result
Time |
Awk |
Grep/SED/awk |
Perl |
Running result |
Start... t = (16:37:19) End. t = (19:01:53) |
Start... t = (21:13:56) End. t = (22:01:47) |
Start... t = (11:34:01) End. t = (12:21:50) |
Time |
144m34s |
47m51s |
47m49s |
4. Conclusion
For text search, the following three methods are available: Use gred/SED/awkCombination with PerlThe running time of the program is almost the same, but the awk is used purely.The program running time is about 3 of the programs written in the preceding two methods.Times