How Python implements the Code statistics tool

Source: Internet
Author: User
This time for you to bring Python how to implement code statistics tools, Python implementation of code statistics tools to note what, the following is the actual case, take a look.

Problem

Design a program to count the number of lines of code in a project, including the number of files, number of lines of code, number of comment lines, and number of blank rows. Try to be as flexible as possible. You can count items in different languages by entering different parameters, such as:

# type to specify file type python counter.py--type python

Output:

Files:10
code_lines:200
comments:100
Blanks:20

Analysis

This is a seemingly simple, but do a little complex design problems, we can put the problem of small, as long as the number of lines can be correctly counted a file, then the statistics of a directory is not a problem, the most complex is about the multi-line comments, Python as an example, note that the code line has the following situations:

1. One-line comment at the beginning of the well sign

# single Comment

2. Multi-line comment characters in the same line

"" This is a multiline comment "" "
"This is also a multi-line comment"
3, multi-line comment characters

"""
These 3 lines are all comment characters
"""

Our ideas take a progressive parsing approach, and multiline annotations require an additional identifier, in_multi_comment, to identify whether the current line is in a multiline comment character, false by default, and set to True when the multiline comment starts, and false when the next multiline comment is encountered. The code from the multiline comment start symbol until the next closing symbol should belong to the comment line.

Knowledge points

How to read files correctly, read out files when string processing, string of common methods

Simplified version

Let's step through the iteration by implementing a streamlined program that only counts Python code's single files, regardless of the multi-line annotations, which is what anyone who gets started with Python can do. The key place is to read each line out, first use the Strip () method to remove the space on both sides of the string, enter the

#-*-Coding:utf-8-*-"" "can only count the single-line comment of the py file" "Def Parse (path): comments = 0 Blanks = 0 codes = 0 with open (path, encoding= ' UTF -8 ') as F:for line in F.readlines (): Line  = Line.strip ()  if line = = "":  blanks + = 1  elif line.startswith (  "#"):  comments + = 1  else:  codes + = 1 return {"Comments": Comments, "blanks": Blanks, "codes": codes}if name = = ' Main ': Print (Parse ("xxx.py"))

Multi-line annotated version

If you can only count the code of a single line of comments, the meaning is not big, to solve the statistics of multi-line comments can be considered a real code statistics

#-*-Coding:utf-8-*-"" "

Can count a py file with multiple lines of comments

 "" "Def Parse (path): In_multi_comment = false # Multiline comment Symbol comments = 0 Blanks = 0 Code s = 0 with open (path, encoding= "Utf-8") as F:for line in F.readlines (): line = Line.strip () # Empty lines in multiple lines of comments are treated as comments if you = "" and not in_multi_comment:blanks + = 1 # Note There are 4 kinds of # 1. # line Note # 2 at the beginning of the pound sign. Multi-line notation in the same line of case # 3.   Lines between multiline annotations Elif Line.startswith ("#") or \ (Line.startswith (' "" "") and Line.endswith (' "" "') and Len (line)) > 3 or \ (Line.startswith ("" ") and Line.endswith (" "") and Len (line) > 3) or \ (In_multi_comment and not (Line.startswith ( "" "") or Line.startswith ("" ")): Comments + = 1 # 4. The start line and end line of the multiline comment elif line.startswith (' "" ") or Line.startswith (" ""): In_multi_comment = not in_multi_comment comments + = 1 Else:codes + = 1 Return {"Comments": Comments, "blanks": Blanks, "codes": codes}if name = = ' main ': Print (Parse ("xxx"). Py ")) 

In the 4th case above, when a multiline comment symbol is encountered, the in_multi_comment identifier is a key operation instead of simply set to False or true, and the second encounter "" "is the Terminator of the multiline comment, the first time it encounters" ". Take the reverse to false, and so on, the third time is the beginning, and the reverse is true.

Do you want to re-write an analytic function in other languages? If you look closely, the 4 cases of multi-line annotations can abstract 4 criteria, because most languages have single-line comments, multiple lines of comments, but their symbols are different.

conf = {"py": {"start_comment": [' "" "'" ' "'" ' "]" end_comment ": ['" "" "'" ""] " Single ":" # "}," Java ": {" start_comment ": ["/* "]," end_comment ": [" */"]," single ":"//"}}start_comment = Conf.get ( exstansion). Get ("start_comment") end_comment = Conf.get (exstansion). Get ("end_comment") Cond2 = Falsecond3 = Falsecond4 = FalseFor Index, item in enumerate (start_comment): Cond2 = Line.startswith (item) and Line.endswith (End_comment[index]) a nd len (line) > Len (item) If Cond2:breakfor item in End_comment:if line.startswith (item): Cond3 = True breakfor Item I  N start_comment+end_comment:if Line.startswith (item): Cond4 = True breakif line = = "" and not in_multi_comment:blanks + = There are 4 # 1 notes. # line Note # 2 at the beginning of the pound sign. Multi-line notation in the same line of case # 3. Lines between multiline Elif Line.startswith (Conf.get (exstansion). Get ("single")) or Cond2 or \ (in_multi_comment and not Cond3): Comments + = # 4. Multiline annotation spread across multiple lines, start and end lines elif cond4:in_multi_comment = not in_multi_comment comments + = 1else:codes + = 1 

Only one configuration constant is required to mark the symbols of the single line and multiline annotations of all languages, corresponding to the COND1 to cond4 several cases are OK. The rest of the task is to parse multiple files and use the Os.walk method.

def counter (path): "" "can count directories or a file:p Aram path:: Return:" "" If Os.path.isdir (path): Comments, blanks, codes = 0, 0, 0 li St_dirs = Os.walk (path) for root, dirs, files in List_dirs: For  F in files:  File_path = Os.path.join (root, f)  Stats = Parse (file_path)  Comments + = Stats.get ("comments")  blanks + = Stats.get ("blanks")  codes + = Stats.get ("codes") return {"Comments": Comments, "blanks": Blanks, "Codes": Codes} Else:return Parse (path)

Of course, there is a lot more work to do to get this program done, including command-line parsing, which only resolves one language based on the specified parameters.

Add:

Python implementation code line counting tool

We often want to count the number of lines of code in a project, but it might not be so easy to think of a statistical function, and today we'll look at how to implement a code-line statistics tool using Python.

Ideas:

First get all the files, then count the number of lines of code in each file, and finally add the number of rows.

The functions implemented:

Count the number of rows per file;
Statistics total number of rows;
Statistic running time;
Support for specifying statistical file types, excluding file types that do not want to be counted;
The number of rows of files under the recursive Statistics folder, including the sub-components;

Exclude empty lines;

# coding=utf-8import Osimport timebasedir = '/root/script ' filelists = []# Specifies the file type to be counted whitelist = [' php ', ' py '] #遍历文件, recursively traverse the text All Def getFile (Basedir) in the folder: Global filelists for Parent,dirnames,filenames in Os.walk (basedir):  #for dirname in Dirnames:  # getFile (Os.path.join (parent,dirname)) #递归 for  filename in filenames:   ext = filename.split ('. ') [-1]   #只统计指定的文件类型, skip some log and cache files   if ext in Whitelist:    filelists.append (Os.path.join (parent,filename)) # Count the number of lines of a file Def countline (fname): Count = 0 for file_line in open (fname). Xreadlines ():  if file_line! = "and File_line! = ' \ n ': #过滤掉空行 count + =   1 print fname + '----', count return countif name = = ' main ': StartTime = Time.clock () getfil E (basedir) totalline = 0 for filelist in filelists:  totalline = totalline + countline (filelist) print ' Total lines: ', t Otalline print ' done! Cost time:%0.2f second '% (Time.clock ()-StartTime)

Results:

[Root@pythontab script]# python countcodeline.py
/root/script/test/gametest.php----16
/root/script/smtp.php----284
/root/script/gametest.php----16
/root/script/countcodeline.py----33
/root/script/sendmail.php----17
/root/script/test/gametest.php----16
Total lines:382
done! Cost time:0.00 Second
[Root@pythontab script]#

Believe that you have read the case of this article you have mastered the method, more exciting please pay attention to the PHP Chinese network other related articles!

Recommended reading:

How to assign a uniform value to an array element in a NumPy

How to use the multiplication of numpy arrays and matrices

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.