Comparison of string search efficiency in Linux commands

Source: Internet
Author: User

 

Sometimes, we need to calculate the number of rows containing a special string in the next file.

 

The first thing I think of is grep + wc. I don't know what you think of, but we do have multiple methods.

 

 

 

Suppose our file is msg, which contains 23380092 rows of data.

 

Some rows are like this receive: msg1

 

Our task is to find the number of these rows.

 

 

 

1. grep Method

 

Grep 'msg' msg | wc-l

 

The time consumed by this method is 1 s.

 

 

 

2. awk Method

 

Awk 'in in {c = 1} {if ($0 ~ /Msg1/) c = c + 1} END {print c} 'msg

 

This method takes 8 s

 

 

 

3. Another awk Method

 

Awk 'in in {FS = ":"; c = 0 ;}{ if ($2 = "msg1") c = c + 1} END {print c} 'msg

 

This method takes 14 s

 

 

 

4. file descriptor open file Traversal

 

 

Shell code

#! /Bin/bash

Count = 0;

Exec 4 <msg

While read line <& 4

Do

If ["$ line" = "receive: msg1"]; then

Count = $ (count + 1 ))

Fi

Done

Exec 4 <&-

Echo $ count;

 

#! /Bin/bash

Count = 0;

Exec 4 <msg

While read line <& 4

Do

If ["$ line" = "receive: msg1"]; then

Count = $ (count + 1 ))

Fi

Done

Exec 4 <&-

Echo $ count.

 

Of course, the correctness of the script itself is tested through small files.

 

 

 

The efficiency of the above methods is simply superb. After so much data is filtered out, it takes only one second after wc statistics.

 

However, the method of traversing files by yourself is very inefficient, whether it is awk or writing scripts to open files by yourself.

 

 

 

It seems that we are still a little lazy. Haha.

 

From liuzhiqiangruc

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.