1. before doing this project, I had worked on some projects in the OO class, and I was very familiar with this algorithm. Therefore, I learned the simple lexical algorithm and file operations in the C # language, it is expected to be written in one day.
2. In fact, I found that C # is a little different from Java I have learned before. The algorithm is very simple, but it took a lot of time to learn to use C #. It took two days to complete the process.
3. I have always thought that the maximum resource usage of a program is to sort words by word frequency. However, after algorithm analysis, it is time-consuming to judge words + spaces + words.
My algorithm is as follows: Read all characters in a text file and store them as a string. The string is traversed from start to end. uppercase and lowercase letters and numbers are both "characters" and the rest are separators, so the entire string is in the form:
Character + separator + character + separator +...
Store all characters in an array in sequence, and store all separators in an array.
In this way, the left and right characters of the I separator are the I + 1 characters (if I + 1 is still within the range of the character array)
The character is not equivalent to the word in the requirement and must be checked.
Create a Word Class, which consists of a string "word" and an integer "quantity.
Function 1: Create a New Word Array and traverse the character array from the front to the back. If there are characters that meet the word conditions, add them to the array. When an array is added to a word, check whether the word (Case Insensitive) exists. If no new word exists, add 1 to the number and update the case of the word. Then, the array is sorted by word frequency. words with the same number are sorted alphabetically by name.
Function 2: Create a new "Double Word" array to traverse the delimiter array. If the Delimiter is a single space, check whether the characters on both sides of the delimiter are words (if the Delimiter is out of bounds, do not check ), for all words, add the "Left word" + "" + "right word"
The string is added to the "double words" array. The processing method is the same as function 1, and the first 10 digits of the word frequency are output.
Function 3: create a "three-word" array and traverse the delimiter array. If two separators are single spaces in a row, check whether all three characters near the two delimiters are words (if the two delimiters are out of bounds, do not check). If both are words, add the "Left word" + "" + "Medium word" + "" + "right word" strings to the "Three words" array. The processing method is the same as that of function 1, the first 10 digits of the output word frequency.
4. Test Cases: a total of 10 articles from the New York Times, some under the test file directory and some under the test file directory
Test with my teammates and compare the results.
5. To improve program efficiency, a good algorithm is very important, which requires careful analysis before programming. In addition, I noticed that the program I wrote has poor portability. The overall function is acceptable, but the programming style of several parts inside is quite stuck, when I write other programs, I often write the functions I have already written. This is not a good habit. I will pay attention to it in the future programming process.
Individual project-Word Frequency Program