Individual project-Word Frequency Program

Source: Internet
Author: User

1. before doing this project, I had worked on some projects in the OO class, and I was very familiar with this algorithm. Therefore, I learned the simple lexical algorithm and file operations in the C # language, it is expected to be written in one day.
2. In fact, I found that C # is a little different from Java I have learned before. The algorithm is very simple, but it took a lot of time to learn to use C #. It took two days to complete the process.

3. I have always thought that the maximum resource usage of a program is to sort words by word frequency. However, after algorithm analysis, it is time-consuming to judge words + spaces + words.

My algorithm is as follows: Read all characters in a text file and store them as a string. The string is traversed from start to end. uppercase and lowercase letters and numbers are both "characters" and the rest are separators, so the entire string is in the form:

Character + separator + character + separator +...

Store all characters in an array in sequence, and store all separators in an array.

In this way, the left and right characters of the I separator are the I + 1 characters (if I + 1 is still within the range of the character array)

The character is not equivalent to the word in the requirement and must be checked.

Create a Word Class, which consists of a string "word" and an integer "quantity.

Function 1: Create a New Word Array and traverse the character array from the front to the back. If there are characters that meet the word conditions, add them to the array. When an array is added to a word, check whether the word (Case Insensitive) exists. If no new word exists, add 1 to the number and update the case of the word. Then, the array is sorted by word frequency. words with the same number are sorted alphabetically by name.

Function 2: Create a new "Double Word" array to traverse the delimiter array. If the Delimiter is a single space, check whether the characters on both sides of the delimiter are words (if the Delimiter is out of bounds, do not check ), for all words, add the "Left word" + "" + "right word"

The string is added to the "double words" array. The processing method is the same as function 1, and the first 10 digits of the word frequency are output.

Function 3: create a "three-word" array and traverse the delimiter array. If two separators are single spaces in a row, check whether all three characters near the two delimiters are words (if the two delimiters are out of bounds, do not check). If both are words, add the "Left word" + "" + "Medium word" + "" + "right word" strings to the "Three words" array. The processing method is the same as that of function 1, the first 10 digits of the output word frequency.

 

4. Test Cases: a total of 10 articles from the New York Times, some under the test file directory and some under the test file directory

Test with my teammates and compare the results.

 

5. To improve program efficiency, a good algorithm is very important, which requires careful analysis before programming. In addition, I noticed that the program I wrote has poor portability. The overall function is acceptable, but the programming style of several parts inside is quite stuck, when I write other programs, I often write the functions I have already written. This is not a good habit. I will pay attention to it in the future programming process.

Individual project-Word Frequency Program

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.