Designing a Code statistics tool using Python

Source: Internet
Author: User
This article mainly describes the use of Python to design a code statistics tool, including the number of files, lines of code, number of comments, lines of blank lines. Interested friends follow the script of the small to see it together


Design a program to count the number of lines of code in a project, including the number of files, number of lines of code, number of comment lines, and number of blank rows. Try to be as flexible as possible. You can count items in different languages by entering different parameters, such as:

# type to specify file type python python




This is a seemingly simple, but do a little complex design problems, we can put the problem of small, as long as the number of lines can be correctly counted a file, then the statistics of a directory is not a problem, the most complex is about the multi-line comments, Python as an example, note that the code line has the following situations:

1. One-line comment at the beginning of the well sign

# single Comment

2. Multi-line comment characters in the same line

"" This is a multiline comment "" "
"This is also a multi-line comment"
3, multi-line comment characters

These 3 lines are all comment characters

Our ideas take a progressive parsing approach, and multiline annotations require an additional identifier, in_multi_comment, to identify whether the current line is in a multiline comment character, false by default, and set to True when the multiline comment starts, and false when the next multiline comment is encountered. The code from the multiline comment start symbol until the next closing symbol should belong to the comment line.

Knowledge points

How to read files correctly, read out files when string processing, string of common methods

Simplified version

Let's step through the iteration by implementing a streamlined program that only counts Python code's single files, regardless of the multi-line annotations, which is what anyone who gets started with Python can do. The key place is to read each line out, first use the Strip () method to remove the space on both sides of the string, enter the

#-*-Coding:utf-8-*-"" "can only count the single-line comment of the py file" "Def Parse (path): comments = 0 Blanks = 0 codes = 0 with open (path, encoding= ' UTF -8 ') as F:for line in F.readlines (): Line  = Line.strip ()  if line = = "":  blanks + = 1  elif line.startswith ( "#"):  comments + = 1  else:  codes + = 1 return {"Comments": Comments, "blanks": Blanks, "codes": Codes}if __name __ = = ' __main__ ': Print (Parse (""))

Multi-line annotated version

If you can only count the code of a single line of comments, the meaning is not big, to solve the statistics of multi-line comments can be considered a real code statistics

#-*-Coding:utf-8-*-"" "

Can count a py file with multiple lines of comments

"" "Def Parse (path): In_multi_comment = False # Multiline Comment Identifier comments = 0 Blanks = 0 codes = 0 with open (path, encoding=" Utf-8 ") as F:for line in F.readlines (): Lines  = Line.strip ()  # empty rows in a multiline comment are treated as comments  if the" "and not In_multi_commen T:  blanks + = 1  # Note There are 4 kinds of  # 1. # Number of line notes at the beginning of the pound sign  # 2. Multiple line comment characters in the same line case  # 3. Lines between multiline comments  Elif Line.startswith ("#" ) or \    (Line.startswith (' "" "') and Line.endswith ('" "" ') and Len (line)) > 3 or \   (Line.startswith ("" ") and Li Ne.endswith ("" ") and Len (line) > 3) or \   (in_multi_comment and not (Line.startswith ('" "") or Line.startswith (" ")):  Comments + = 1  # 4. The start line and end line of the multiline comment  elif line.startswith (' "" ") or Line.startswith (" "):  in_multi_comment = Not in_multi_ Comment  Comments + = 1  else:  codes + = 1 return {"Comments": Comments, "blanks": Blanks, "codes": Codes}if __ name__ = = ' __main__ ': Print (Parse (""))

In the 4th case above, when a multiline comment symbol is encountered, the in_multi_comment identifier is a key operation instead of simply set to False or true, and the second encounter "" "is the Terminator of the multiline comment, the first time it encounters" ". Take the reverse to false, and so on, the third time is the beginning, and the reverse is true.

Do you want to re-write an analytic function in other languages? If you look closely, the 4 cases of multi-line annotations can abstract 4 criteria, because most languages have single-line comments, multiple lines of comments, but their symbols are different.

conf = {"py": {"start_comment": [' "" "'" ' "'"] "end_comment": "" "" "" "" "" "" "", "" "" " VA ": {" start_comment ": ["/* "]," end_comment ": [" */"]," single ":"//"}}start_comment = Conf.get (exstansion). Get (" Start_ Comment ") end_comment = Conf.get (exstansion). Get (" end_comment ") Cond2 = Falsecond3 = Falsecond4 = FalseFor index, item in EN Umerate (start_comment): Cond2 = Line.startswith (item) and Line.endswith (End_comment[index]) and Len (line) > Len (item If Cond2:breakfor item in End_comment:if line.startswith (item): Cond3 = True breakfor item in Start_comment+end_commen T:if Line.startswith (item): Cond4 = True breakif line = = ' and not in_multi_comment:blanks + = # Note There are 4 kinds of # 1. # line Note # 2 at the beginning of the pound sign. Multi-line notation in the same line of case # 3. Lines between multiline Elif Line.startswith (Conf.get (exstansion). Get ("single")) or Cond2 or \ (in_multi_comment and not Cond3): Comments + = # 4. Multiline annotation spread across multiple lines, start and end lines elif cond4:in_multi_comment = not in_multi_comment comments + = 1else:codes + = 1 

Only one configuration constant is required to mark the symbols of the single line and multiline annotations of all languages, corresponding to the COND1 to cond4 several cases are OK. The rest of the task is to parse multiple files and use the Os.walk method.

def counter (path): "" "can count directories or a file:p Aram path:: Return:" "" If Os.path.isdir (path): Comments, blanks, codes = 0, 0, 0 li St_dirs = Os.walk (path) for root, dirs, files in List_dirs: For  F in files:  File_path = Os.path.join (root, f)  Stats = Parse (file_path)  Comments + = Stats.get ("comments")  blanks + = Stats.get ("blanks")  codes + = Stats.get ("codes") return {"Comments": Comments, "blanks": Blanks, "Codes": Codes} Else:return Parse (path)

Of course, there is a lot more work to do to get this program done, including command-line parsing, which only resolves one language based on the specified parameters.


Python implementation code line counting tool

We often want to count the number of lines of code in a project, but it might not be so easy to think of a statistical function, and today we'll look at how to implement a code-line statistics tool using Python.


First get all the files, then count the number of lines of code in each file, and finally add the number of rows.

The functions implemented:

Count the number of rows per file;
Statistics total number of rows;
Statistic running time;
Support for specifying statistical file types, excluding file types that do not want to be counted;
The number of rows of files under the recursive Statistics folder, including the sub-components;

Exclude empty lines;

# coding=utf-8import Osimport timebasedir = '/root/script ' filelists = []# Specifies the file type to be counted whitelist = [' php ', ' py '] #遍历文件, recursively traverse the text All Def getFile (Basedir) in the folder: Global filelists for Parent,dirnames,filenames in Os.walk (basedir):  #for dirname in Dirnames:  # getFile (Os.path.join (parent,dirname)) #递归 for  filename in filenames:   ext = filename.split ('. ') [-1]   #只统计指定的文件类型, skip some log and cache files   if ext in Whitelist:    filelists.append (Os.path.join (parent,filename)) # Count the number of lines of a file Def countline (fname): Count = 0 for file_line in open (fname). Xreadlines ():  if file_line! = "and File_line! = ' \ n ': #过滤掉空行 count + =   1 print fname + '----', count return countif __name__ = = ' __main__ ': StartTime = Time.clock ( ) GetFile (basedir) totalline = 0 for filelist in filelists:  totalline = totalline + countline (filelist) print ' Total l Ines: ', totalline print ' done! Cost time:%0.2f second '% (Time.clock ()-StartTime)


[Root@pythontab script]# python
Total lines:382
done! Cost time:0.00 Second
[Root@pythontab script]#

It is very convenient to only count PHP and Python files.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.