Individual project-Word Frequency Program

Last Update:2014-09-25 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. before doing this project, I had worked on some projects in the OO class, and I was very familiar with this algorithm. Therefore, I learned the simple lexical algorithm and file operations in the C # language, it is expected to be written in one day.
2. In fact, I found that C # is a little different from Java I have learned before. The algorithm is very simple, but it took a lot of time to learn to use C #. It took two days to complete the process.

3. I have always thought that the maximum resource usage of a program is to sort words by word frequency. However, after algorithm analysis, it is time-consuming to judge words + spaces + words.

My algorithm is as follows: Read all characters in a text file and store them as a string. The string is traversed from start to end. uppercase and lowercase letters and numbers are both "characters" and the rest are separators, so the entire string is in the form:

Character + separator + character + separator +...

Store all characters in an array in sequence, and store all separators in an array.

In this way, the left and right characters of the I separator are the I + 1 characters (if I + 1 is still within the range of the character array)

The character is not equivalent to the word in the requirement and must be checked.

Create a Word Class, which consists of a string "word" and an integer "quantity.

Function 1: Create a New Word Array and traverse the character array from the front to the back. If there are characters that meet the word conditions, add them to the array. When an array is added to a word, check whether the word (Case Insensitive) exists. If no new word exists, add 1 to the number and update the case of the word. Then, the array is sorted by word frequency. words with the same number are sorted alphabetically by name.

Function 2: Create a new "Double Word" array to traverse the delimiter array. If the Delimiter is a single space, check whether the characters on both sides of the delimiter are words (if the Delimiter is out of bounds, do not check ), for all words, add the "Left word" + "" + "right word"

The string is added to the "double words" array. The processing method is the same as function 1, and the first 10 digits of the word frequency are output.

Function 3: create a "three-word" array and traverse the delimiter array. If two separators are single spaces in a row, check whether all three characters near the two delimiters are words (if the two delimiters are out of bounds, do not check). If both are words, add the "Left word" + "" + "Medium word" + "" + "right word" strings to the "Three words" array. The processing method is the same as that of function 1, the first 10 digits of the output word frequency.

4. Test Cases: a total of 10 articles from the New York Times, some under the test file directory and some under the test file directory

Test with my teammates and compare the results.

5. To improve program efficiency, a good algorithm is very important, which requires careful analysis before programming. In addition, I noticed that the program I wrote has poor portability. The overall function is acceptable, but the programming style of several parts inside is quite stuck, when I write other programs, I often write the functions I have already written. This is not a good habit. I will pay attention to it in the future programming process.

Individual project-Word Frequency Program

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Individual project-Word Frequency Program

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Individual project-Word Frequency Program

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support