Finding the frequency of words used in a file is a very interesting thing, and we use associative arrays, awk,sed,grep, and so on to solve the problem in different ways.
First, we need a test text to save the name Word.txt
The contents are as follows:
Word used
This counting
Next you need to write a shell script, as follows:
#!/bin/bash
#Name: word_freq.sh
#Description: Find out frequency of words in a file
if [$#-ne 1];
Then
echo "Usage: $ filename";
Exit-1
fi
filename=$1
egrep-o "\b[[:alpha:]]+\b" $filename | \
awk ' {count[$0]++} end{printf ("%- 14s%s\n "," Word "," Count "); \
for (Ind in count) {printf ("%-14s%d\n", Ind,count[ind]);} '
Work Principle Introduction:
1.egrep-o "\b[[:alpha:]]+\b" $filename used to output only words, with the-O option to print out a sequence of matched characters separated by line breaks so that we can list one word per line
2.\b is a word boundary marker. [: Alpha:] is the character class that represents the letter
The 3.awk command is used to avoid iterating over each word
A screenshot of the operation is given below:
Please refer to the blogger's other blogs about the awk command.