[JAVA] Basic data Set analysis techniques

Source: Internet
Author: User

Extract information and statistics, sort

The task is to preprocess the corpus and extract useful information and statistics on the answers and responses of 1000 questions.

First, analyze the text content, format, and think about how to extract. There are three types of answer, comment, and vote, so the file type is determined at the beginning based on the first few words Mr. Foo.

Then constructs the object and the method according to the demand, and constructs the object list to store the extracted information.

The specific extraction method is to use the split character to divide the text into arrays, and then continue dividing it in the same way until you get the keyword you want and put it in the list.

Get all the author names, question IDs, answer IDs, reply IDs, and write new files.

The second step is to count the number of answers for each author, the number of replies, and the hashmap<string,object> to save the results, and if the same author is the same, the answer or reply is counted according to whether the reply is empty.

The final result is stored in the new list, in the format: Author name, number of answers, number of replies.

Finally, construct the comparator comparator to sort by the number of writes (answer + reply) Ascending, if the same is sorted by answer. and total statistics, total number of authors, total number of answers, total number of replies,

Find the average number of answers and responses written by each author. Note that the results are expressed in decimals.

[JAVA] Basic data Set analysis techniques

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.