How to realize the frequency statistic of characters in the Linux command line

Source: Internet
Author: User
Tags manual sort

The Linux command line has a lot of fun and we can easily and perfectly perform a lot of tedious tasks. For example, we calculate how often characters and characters appear in a text file, which is what we intend to say in this article.

Immediately came to our minds the command of computing words and characters in a text file frequency of LINUX commands is WC command.

Before using a script to parse a text file, we must have a text file. To maintain consistency, we will create a text file, and the output of the man command is described below.

The code is as follows:

\$ man mans > Man.txt

The above command is to import the use of the Man command into the Man.txt file.

We'd like to get the most common words and execute the following script for our newly created files.

The code is as follows:

\$ Cat Man.txt | Tr ' ' 12 ' | TR ' [: Upper:] ' [: Lower:] ' | Tr-d ' [:p unct:] ' | Grep-v ' [^a-z] ' | Sort | uniq-c | Sort-rn | Head

Sample Output

The code is as follows:

7557

262 the

163 to

112 is

112 A

Of

Manual

The

If

The

The above script outputs the 10 words that are most commonly used.

How do you look at a single letter? Then use the following command.

The code is as follows:

\$ Echo ' Tecmint team ' | Fold-w1

Sample Output

[Code] t

E

C

M

I

N

T

T

E

A

M

Note:-W1 just set the length

Now we'll sort the results by breaking each letter from that text file to get the 10 most common characters for the desired output frequency.

\$ FOLD-W1 < Man.txt | Sort | uniq-c | Sort-rn | Head

Sample Output

The code is as follows:

8579

2413 E

1987 A

1875 T

1644 I

1553 N

1522 O

1514 S

1224 R

1021 L

How do you differentiate between case? We've all been ignoring the case before. So, use the following command.

\$ FOLD-W1 < Man.txt | Sort | TR ' [: Lower:] ' [: Upper:] ' | uniq-c | Sort-rn | Head-20

Sample Output

The code is as follows:

11636

2504 E

2079 A

The T

1729 I

1645 N

1632 S

1580 O

1269 R

1055 L

836 H

791 P

766 D

753 C

725 M

690 U

605 F

504 G

352 Y

344.

Please check the output above, the punctuation is included. Let's kill him, with the TR command. Go:

The code is as follows:

\$ FOLD-W1 < Man.txt | TR ' [: Lower:] ' [: Upper:] ' | Sort | Tr-d ' [:p unct:] ' | uniq-c | Sort-rn | Head-20

Sample Output

The code is as follows:

11636

2504 E

2079 A

The T

1729 I

1645 N

1632 S

1580 O

1550

1269 R

1055 L

836 H

791 P

766 D

753 C

725 M

690 U

605 F

504 G

352 Y

Now that we have three text, let's look at the results with the following command.

The code is as follows:

\$ Cat *.txt | FOLD-W1 | TR ' [: Lower:] ' [: Upper:] ' | Sort | Tr-d ' [:p unct:] ' | uniq-c | Sort-rn | Head-8

Sample Output

The code is as follows:

11636

2504 E

2079 A

The T

1729 I

1645 N

1632 S

1580 O

Next we will generate those rare words with at least 10 letters long. Here's a simple script:

The code is as follows:

\$ Cat Man.txt | Tr ' ' 12 ' | TR ' [: Upper:] ' [: Lower:] ' | Tr-d ' [:p unct:] ' | Tr-d ' [0-9] ' | Sort | uniq-c | Sort-n | Grep-e ' .......... ... ' | Head

Sample Output

The code is as follows:

1──────────────────────────────────────────

1 a All

1 ABC or all arguments within are optional

1 able setlocale for precise details

1 ab Options delimited by cannot is used together

1 achieved by using the less environment variable

1 A child process returned a nonzero exit status

1 act as if this option is supplied using the name as a filename

1 activate local mode format and display local manual files

1 acute accent

Note: The above. More and more, in fact, we can use. {10} Gets the same effect.

These simple scripts let us know the most frequently occurring words and the characters in English.

It's over now. Next time I'll be here to talk about another interesting topic that you should like to read. And don't forget to give us your valuable advice.

Related Keywords:
Related Article

E-Commerce Solutions

Leverage the same tools powering the Alibaba Ecosystem

Apsara Conference 2019

The Rise of Data Intelligence, September 25th - 27th, Hangzhou, China

Alibaba Cloud Free Trial

Learn and experience the power of Alibaba Cloud with a free trial worth \$300-1200 USD