The fun of Linux terminal

Source: Internet
Author: User

The fun of Linux terminal

Linux Command Line has a lot of fun, and we can easily and fully execute a lot of tedious tasks. For example, we calculate the frequency of occurrence of words and characters in a text file, which we intend to discuss in this article.

The command that comes to our mind immediately. The Linux Command that calculates the frequency of word and character appearing in a text file is the wc command.

Before using scripts to analyze text files, we must have a text file. To maintain consistency, we will create a text file. The man command output is described below.

  1. $ Man> man.txt

Commands are used to import the commands to the man.txt file.

We want to get the most common words and execute the following script for the previously created file.

  1. $ Cat man.txt | tr ''' \ 012 '| tr' [: upper:] ''[: lower:]' | tr-d' [: punct:] '| grep-V' [^ a-z]' | sort | uniq-c | sort-rn | head
Sample Output
  1. 7557
  2. 262
  3. 163
  4. Token is
  5. 112
  6. 78
  7. 78 manual
  8. 76and
  9. 64if
  10. 63 be

The above script outputs the ten most commonly used words.

How to view a single letter? Use the following command.

  1. $ Echo 'tecmint team' | fold-w1
Sample Output
  1. T
  2. E
  3. C
  4. M
  5. I
  6. N
  7. T
  8. T
  9. E
  10. A
  11. M

Note:-w1 only sets the length.

Now we will sort the results from every letter in the text file to obtain the 10 most common characters of the desired output frequency.

  1. $ Fold-w1 <man.txt | sort | uniq-c | sort-rn | head
Sample Output
  1. 8579
  2. 2413 e
  3. 1987
  4. 1875 TB
  5. 1644 I
  6. 1553 n
  7. 1522 o
  8. 1514 s
  9. 1224 r
  10. 1021 l

How is case sensitive? We used to ignore case sensitivity. Therefore, use the following command.

  1. $ Fold-w1 <man.txt | sort | tr' [: lower:] ''[: upper:] '| uniq-c | sort-rn | head-20
Sample Output
  1. 11636
  2. 2504 E
  3. 2079
  4. 2005 TB
  5. 1729 I
  6. 1645 N
  7. 1632 S
  8. 1580 o
  9. 1269 R
  10. 1055 L
  11. 836 H
  12. 791 P
  13. 766 D
  14. 753 C
  15. 725 M
  16. 690 U
  17. 605 F
  18. 504 GB
  19. 352 Y
  20. 344.

Check the above output. The punctuation marks are included. Let's kill him and run the tr command. GO:

  1. $ Fold-w1 <man.txt | tr '[: lower:] ''[: upper:]' | sort | tr-d' [: punct:] '| uniq-c | sort-rn | head-20
Sample Output
  1. 11636
  2. 2504 E
  3. 2079
  4. 2005 TB
  5. 1729 I
  6. 1645 N
  7. 1632 S
  8. 1580 O
  9. 1550
  10. 1269 R
  11. 1055 L
  12. 836 H
  13. 791 P
  14. 766 D
  15. 753 C
  16. 725 M
  17. 690 U
  18. 605 F
  19. 504 GB
  20. 352 Y

Now we have three texts. Let's use the following command to view the results.

  1. $ Cat *. txt | fold-w1 | tr' [: lower:] ''[: upper:] '| sort | tr-d' [: punct:] '| uniq-c | sort-rn | head-8
Sample Output
  1. 11636
  2. 2504 E
  3. 2079
  4. 2005 TB
  5. 1729 I
  6. 1645 N
  7. 1632 S
  8. 1580 O

Next we will generate rare words with at least 10 letters long. The following is a simple script:

  1. $ Cat man.txt | tr ''' \ 012 '| tr' [: upper:] ''[: lower:]' | tr-d' [: punct:] '| tr-d' [0-9]' | sort | uniq-c | sort-n | grep-E '............. ..... '| head
Sample Output
  1. 1 ── ─
  2. 1 a all
  3. 1 abc any or all arguments within are optional
  4. 1 able see setlocale for precise details
  5. 1 AB options delimited by cannot be used together
  6. 1 achieved byusing the less environment variable
  7. 1 a child process returned a nonzero exit status
  8. 1 act asifthis option was supplied using the name as a filename
  9. 1 activate local mode format and display local manual files
  10. 1 acute accent

Note: There are more and more. In fact, we can use. {10} to achieve the same effect.

These simple scripts let us know the most frequently occurring words and characters in English.

Now it is over. Next time, I will talk about another interesting topic here. You should like to read it. Don't forget to give us your valuable comments.

This article permanently updates the link address:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.