Term Frequency Statistics of Software Engineering

Source: Internet
Author: User

Code: https://github.com/jackroos/word_frequency

How you collaborate: working separately? Pair programming? Vs live share? Other style?

First, we discussed the code structure and how to use python for faster Word Frequency Statistics. Then there is a division of labor and cooperation. My teammates are responsible for python implementation. I am responsible for code review, unit testing, and regression testing. At the same time, we have analyzed and optimized code bottlenecks through programming.

How do you discuss design guideline, coding Convention and reach agreement?

When designing code modules, we follow the principle that the functions of each module are relatively independent. The Code style uses pycharm and 4 spaces for indentation. The function name can be seen as much as possible. Therefore, based on this rule, we have designed functions and interfaces for each module.

    • Freq_dict.py: the parent class of all freq classes. It provides topk and recursive file paths.
    • Charater_freq_dict.py: counts the letter frequency.
    • Word_freq_dict.py: Word Frequency Statistics, reference Stop Word
    • Phrase_freq_dict.py: phrase frequency statistics. It supports Stop Word and verb prototype changes.
    • Preprocessing. py: read various Parse Files to the desired format, such as Word List and stop-word filtering.

How did the two of you aim high and try to deliver the optimal result with your own time constraints? Is this the best your cocould do? What prevent you from doing your best?

Before the project starts, we should fully discuss the project and use each of our strengths to ensure high efficiency. In addition, I feel that I have learned a lot from my teammates in designing and designing. Because the code is Python, it is not very well optimized in many places. We mainly optimize the code from multiple processes. If there is more time, further optimization may occur, but I don't think there is much room for optimization.

List 3 strengths and 1 weak area of your partner

Familiar with python, clear interface module design, clear thinking, and careless code writing

How do you use profile tools to find the performance bottleneck and improve speed? Show some screenshots of your analysis

At the beginning, we planned to use multiple processes to process multiple files in parallel. After the code was written, we used cprofile and gprofdot for performance testing and analysis, we found that when processing multiple files, the Code does not seem to be in parallel:


We found that the time is mainly spent on waiting for the process to obtain the lock. The Python multi-process will copy each of the shared variables object_lists, in this way, it takes a lot of time to copy the sentence list after parse, so we move parse_raw_text_to_sentences to the function and calculate the sentence list in the function. The parameter is copied to text_path_list, which is much faster.
Before modification:

 
Def _ get_freq_dict_for_text (self, index): CNT = counter () object_list = self. object_lists [Index]

After modification:

 
Def _ get_freq_dict_for_text (self, index): CNT = counter () object_list = parse_raw_text_to_sentences (self. text_path_list [Index])

Time is halved:

Unit Test/regression test

See GitHub

Term Frequency Statistics of Software Engineering

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.