最近需要操作inf格式文檔,用原本的文本讀取方式不成功,於是搜尋了一下找到原因。需要讀取的目錄下的檔案有兩種編碼方式,一種是ANSI,另外一種是Unicode,但是Unicode的儲存方式有UTF-8,UTF-16等,UTF即為Unicode Translation Format,就是把Unicode轉做某種格式的意思。讀取Unicode編碼方式的文本時需要標明其儲存方式,否則會出錯。
如下代碼可以讀取指定目錄下面以"test"為首碼以".txt"為尾碼的文字檔(檔案可以存在子目錄下),搜尋到其中是否含有"done"字串,若有則輸出其檔案夾路徑,其完整路徑還有含有該字串的行。搜尋完畢後將搜尋到的檔案完整路徑存在一個list中並再次輸出。
import os.pathimport codecsrootdir = "C:\\test\\code_python\\testfile"def lookupstring(lookup): filelist = [] judge = False for parent, dirnames, filenames in os.walk(rootdir): for filename in filenames: if filename.startswith("test") and filename.endswith(".txt"): try: f = codecs.open(os.path.join(parent, filename), 'r', 'utf-16') ls = [ line.strip() for line in f] for line in ls: if not line.find(lookup) == -1: print "parent is:" + parent print "filename with full path :" + os.path.join(parent, filename) print line judge = True if judge == True: filelist.append(os.path.join(parent, filename)) judge = False f.close() except: f.close() f = open(os.path.join(parent, filename), 'r') ls = f.readlines() for line in ls: if not line.find(lookup) == -1: print "parent is:" + parent print "filename with full path :" + os.path.join(parent, filename) print line judge = True if judge == True: filelist.append(os.path.join(parent, filename)) judge = False f.close() return filelist def main(): test = lookupstring("done") for element in test: print elementif __name__ == '__main__': main()