如何編寫Python指令碼替換檔案中的多行字元

最後更新：2018-12-06 來源：互聯網

上載者：User

創建阿里雲帳戶，並獲得超過 40 款產品的免費試用版；而企業帳戶則可以享有總值 $1200 的免費試用版。立即註冊！

在大概3個月之前，Python對我來說一直是個迷。然而，就在3個月前我經理給我一個任務——刪除（替換）所有項目源碼檔案中包含特定幾行內容的所有注釋。整個項目源碼的大小有1G，在Linux伺服器（中高檔）上編譯需要半個多小時，可見代碼量之大，不可能手動去一個一個改。肯定得用指令碼去處理，於是我想到了Python。在這之前沒有接觸過Python，花了2個星期一頓惡補之後，總算順利交差了。

一直很想和大家分享一下碰到的問題及我如何解決的（可能我的方案並不好，但是他能夠解決我的問題），但一直拖到現在是因為我感覺我還對Python的瞭解還不夠。因為要在短時間內完成上面交下來的任務，在學習Python的時候，都是走馬觀花，對解決自己的問題不相關的直接跳過，看資料也靜不下心，腦海裡都是問題。前幾天我靜下心把Python的書從頭到尾瀏覽了一遍，感覺現在是時候要進行總結了。

本文的主要內容如下：

問題描述
解題思路
代碼實現
Python的特點
1、問題描述
項目源碼很大，屬於C/C++混合的那種，編程風格也很多樣，有'.c'、'.cc'、'cpp'、'.h'、'.hh'等檔案。我要完成的任務是：把包含特定幾行內容的注釋刪掉，如（聲明：下面的內容只是我隨便舉的一個例子，項目源碼中不涉及下面的內容。）

/*
* Copyright 2002 Sun Microsystems, Inc. All rights reserved.

* Redistribution and use in source and binary forms, with or without

* modification, are permitted provided that the following conditions

* are met:

* - Redistributions of source code must retain the above copyright

* notice, this list of conditions and the following disclaimer.

* - Redistribution in binary form must reproduce the above copyright

* notice, this list of conditions and the following disclaimer in

* the documentation and/or other materials provided with the

* distribution.

* Neither the name of Sun Microsystems, Inc. or the names of

* contributors may be used to endorse or promote products derived

* from this software without specific prior written permission.

但是格式有很多種，如有的在“ Copyright 2002 Sun Microsystems, Inc. All rights reserved.”前面有一段關於本源碼檔案的描述、有的在“from this software without specific prior written permission.”後面有一段關於本源碼檔案的描述、有的是C++風格的注釋用"//",而不是“/**/”、還有的沒有

“ * - Redistribution in binary form must reproduce the above copyright

* notice, this list of conditions and the following disclaimer in

* the documentation and/or other materials provided with the

* distribution.”等等還有其他一些。總之一句話，我要刪除的包含特定幾行內容的注釋有很多中格式！

於是我決定要用Python來編寫指令碼處理。要匹配特定的內容，我想到了用Regex，但苦於不知道如何去構建正則來匹配上面描述的內容（您知道的話，希望能夠告訴我）！我只有另闢路徑了。

2、解題思路
我的思路——要刪除所有項目源碼中包含特定幾行內容的注釋，指令碼要滿足以下幾點功能：

指令碼要能夠遍曆所有的源碼檔案（'.c'、'.cc'、'cpp'、'.h'、'.hh'），並只處理上面的幾種類型的檔案
找出包含特定幾行內容的注釋，並刪除之
能夠處理一些特殊情況，如軟串連檔案
上面的幾點的處理步驟可以表示如下：

Step 1：輸入要處理源碼檔案夾名，或者源碼檔案名稱；

Step 2：如果是檔案名稱，檢查檔案的類型是否為'.c'、'.cc'、'cpp'、'.h'、'.hh'，否則不處理；

Step 3：檢查檔案是否是軟串連，如果是軟串連則不處理；

Step 4：尋找檔案中是否存在匹配的注釋，存在則刪掉，否則不處理；

Step 5：如果是檔案夾，則對檔案夾中的每個檔案、檔案夾進行處理，轉Step2.

思路很明確，關鍵是如何尋找檔案中是否包含匹配的內容，並刪除！還有就是，對於一個沒用過Python等指令碼語言的人來說，如何編碼實現也是一個問題！

如何確定注釋是否為包含特定幾行內容的注釋？我的思路如下：（因為Regex學的不好，只有通過下面的方法了）

如果是/*、//則記錄下當前的檔案行數，即行號startLine
以行為單位尋找是否存在特定的幾行，如“ Copyright 2002 Sun Microsystems, Inc. All rights reserved.”等等
直到遇到*/，或注釋結束了（對於//）。如果存在，則記錄下注釋結束的行號endLine
最後，刪掉這從startLine ~ endLine的內容。
3、代碼實現
廢話我不多說了，直接按照上面的執行個體實現代碼，如果你對Python不熟，請參閱相關資料。
#!/usr/bin/env python
#Filename: comment.py

import os, sys, fileinput

#-------------------------------------------------------------
def usage():
print u'''
help: comment.py <filename | dirname>

[dirname]: Option, select a directory to operate
[filename]: Option, select a file to operate

    Example: python comment.py /home/saylor/test
    '''
#--------------------------------------------------------------
def commentFile(src, fileList):
    '''
    description: comment files
    param src: Operate file name
    '''
    #if file exist?
    ifnot os.path.exists(src):
        print'Error: file - %s doesn\'t exist.'% src
        return False
    if os.path.islink(src):
        print'Error: file - %s is just a link, will not handle it.'
        return False
    filetype = (os.path.splitext(src))[1]
    ifnot filetype in ['.c','.h']:
        return False
    try:
        ifnot os.access(src, os.W_OK):
            os.chmod(src, 0664)
    except:
        print'Error: you can not chang %s\'s mode.'% src
    try:
        inputf = open(src, 'r')
        outputfilename = src +'.tmp'
        outputf = open(outputfilename, 'w')
    beginLine = 0
    endLine =100000000
    isMatched = False

    #-----find the beginLine and endLine -------------------
        for eachline in fileinput.input(src):
        if eachline.find('/*') >= 0:
        beginLine = fileinput.lineno()
        if eachline.find('Copyright 2002 Sun Microsystems, Inc. All rights reserved.') >= 0:
            isMatched = True
        if eachline.find('*/') >= 0 and isMatched:
        endLine = fileinput.lineno()
        break

    #-----delete the content between beginLine and endLine-----
    print beginLine, endLine
    lineNo =1
    for eachline in inputf:
        if lineNo < beginLine:
        print eachline
        outputf.write(eachline)
        elif lineNo > endLine:
        print eachline
        outputf.write(eachline)
        lineNo = lineNo +1

        inputf.close()
        outputf.close()
        os.rename(outputfilename, src)
        fileList.append(src)
    except:
        print'Error: unexcept error.'
        inputf.close()
        outputf.close()
    return True

#--------------------------------------------------------------
def commentDir(src, fileList):
    '''
    description:
         comment files in src(dir)
    param src:
         operate files in src(dir)
    '''
    #if dir exist?
    ifnot os.path.exists(src):
        print'Error: dir - %s is not exist.'%s (src)
        return False
    filelists = os.listdir(src)
    for eachfile in filelists:
        eachfile = src +'/'+eachfile
        if os.path.isdir(eachfile):
            commentDir(eachfile, fileList)
        elif os.path.isfile(eachfile):
            commentFile(eachfile, fileList)
    return True

#--------------------------------------------------------------
def main():
    if len(sys.argv) <2:
        usage()
        sys.exit(1)
    src = sys.argv[1]
    if os.path.isdir(src):
        dire = os.path.abspath(src)
        dirFlag = True
    elif os.path.isfile(src):
        fl = os.path.abspath(src)
        dirFlag = False
    else:
        print'Error'
    fileList = []
    if dirFlag:
        commentDir(dire, fileList)
    else:
        commentFile(fl, fileList)
    if fileList:
        print'Successful handle file: ...'
        for eachfile in fileList:
            print eachfile
    print'Done'
    return True

#--------------------------------------------------------------
if__name__=='__main__':
main()

本文章原先以中文撰寫並發佈於 aliyun.com，亦設英文版本，僅作資訊用途。本網站不對文章的準確性，完整性或可靠性或其任何翻譯作出任何明示或暗示的陳述或保證。如對該文章有任何疑慮或投訴，請傳送電郵至 info-contact@alibabacloud.com 並提供相關疑慮或投訴的詳細說明。職員會於 5 個工作天內與您聯絡，一經驗證之後，即會刪除該侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More