Sometimes, we need to calculate the number of rows containing a special string in the next file.
The first thing I think of is grep + wc. I don't know what you think of, but we do have multiple methods.
Suppose our file is msg, which contains 23380092 rows of data.
Some rows are like this receive: msg1
Our task is to find the number of these rows.
1. grep Method
Grep 'msg' msg | wc-l
The time consumed by this method is 1 s.
2. awk Method
Awk 'in in {c = 1} {if ($0 ~ /Msg1/) c = c + 1} END {print c} 'msg
This method takes 8 s
3. Another awk Method
Awk 'in in {FS = ":"; c = 0 ;}{ if ($2 = "msg1") c = c + 1} END {print c} 'msg
This method takes 14 s
4. file descriptor open file Traversal
Shell code
#! /Bin/bash
Count = 0;
Exec 4 <msg
While read line <& 4
Do
If ["$ line" = "receive: msg1"]; then
Count = $ (count + 1 ))
Fi
Done
Exec 4 <&-
Echo $ count;
#! /Bin/bash
Count = 0;
Exec 4 <msg
While read line <& 4
Do
If ["$ line" = "receive: msg1"]; then
Count = $ (count + 1 ))
Fi
Done
Exec 4 <&-
Echo $ count.
Of course, the correctness of the script itself is tested through small files.
The efficiency of the above methods is simply superb. After so much data is filtered out, it takes only one second after wc statistics.
However, the method of traversing files by yourself is very inefficient, whether it is awk or writing scripts to open files by yourself.
It seems that we are still a little lazy. Haha.
From liuzhiqiangruc