Demand:
1. Design a word frequency statistic software to count the frequencies of words given in English articles.
2. The punctuation included in the article does not count toward statistics.
3. Output the statistical results in a sort order from large to small.
Design:
1. Because it is a cross-professional 0.0 Not C + + and Java, can only be written in the C language only learned, or very laborious.
2. Define a structure that contains two members of the word and frequency to count the frequency of words (the memory is dynamically allocated and can handle large text).
3. Use the fopen function to read the specified document.
4. Use the FGETC function to get the character, and then treat it differently depending on whether the obtained character is a letter.
5. Use the Quick Sort method to sort the statistical results.
5. Loop out the entire statistic result.
Part of the code:
Structure definition:
struct fre_word{ int num; Char a[
Allocate Initial Memory:
struct Fre_word *W; W= (struct fre_word *)malloc(*p*sizeof(struct fre_word)); // allocating initial memory to structs
Read text:
printf (" Enter the name of the read-in file:"); scanf ("%s", filename); // Enter the name of the word frequency to be counted if " R ")) = =NULL ) { printf (" cannot open file \ n"); Exit (0); }
Word Matching:
/**************** Set word occurrences to 1****************************/ for(i=0;i< -; i++) {(W+i)->num=1; }/**************** Word matching ****************************************/I=0; while(!feof (FP))//file has not been read finished{ch=fgetc (FP); (W+i)->a[j]=' /'; if(ch>= $&&ch<= -|| ch>= the&&ch<=122)//CH If the letter is credited{(W+i)->a[j]=ch; J++; Flag=0;//set the flag to determine if there are consecutive punctuation or spaces } Else if(! (ch>= $&&ch<= -|| ch>= the&&ch<=122) &&flag==0)//CH is not a letter and the previous character is a letter{i++; J=0; Flag=1; for(m=0; m<i-1; m++)//match words, if they already exist num+1 { if(STRICMP (W+m)->a, (w+i-1)->a) = =0) {(W+M)->num++; I--; } } }/**************** dynamically allocating memory ****************************************/ if(i== (p* -))//use I to determine the current memory is full{p++; W=(structfre_word*)realloc(W, -*p* (sizeof(struct( Fre_word))); for(n=i;n<= -*p;n++)//assigns the initial value to the newly allocated structure body(w+n)->num=1; } }
Quick sort:
voidQuickstructFre_word *f,intIintj) {intm,n,temp,k; Charb[ -]; M=i; N=K; K=f[(I+J)/2].num;//the selected reference Do { while(f[m].num>k&&m<j) m++;//find elements smaller than k from left to right while(f[n].num<k&&n>i) n--;//find elements larger than k from right to left if(m<=N) {//If the condition is found and satisfied, the interchangetemp=F[m].num; strcpy (B,F[M].A); F[m].num=F[n].num; strcpy (F[M].A,F[N].A); F[n].num=temp; strcpy (F[N].A,B); M++; N--; } } while(m<=N); if(m<j) quick (f,m,j);//using recursion if(n>i) quick (f,i,n); }
Result output:
for (n=0; n<=i;n++) { printf (" words appearing in the document:"); printf ("%-18s", (w+n),a); printf (" the number of occurrences is:"); printf ("%d\n", (w+n),num); }
Test Case:
After reading the previous students ' blogs and the teacher's comments, they used a long text to test the president's inaugural speech.
Some test results:
C language implementation of Word frequency statistics