Because the work needs to be in a large number of logs to extract some of the corresponding fixed characters, if simply by manual extraction, data volume, labor, and then naturally think of using Python to do a corresponding extraction tool, instead of manual extraction of the complex, involving Chinese characters, regular expression bad match, but not can not be achieved, After this optimization is again.
Requirements Description:
There are multiple subfolders in a parent directory, with multiple TXT-formalized log logs in the subfolder, requiring that the value of Cardtype=9, cardno=0, cardid be found from all log logs and stored in a text file, requiring Cardid not to be duplicated.
Requirements Analysis:
The full path of all log logs is obtained first, loaded separately according to the path to load each log log into memory for extraction analysis, and the result is stored in the given text file.
Solution:
To be as concise as possible, the configuration file is used as the basis for the input variables. Not much to say, on the code:
The configuration file is as follows:
103 folders have two files: Log1.txt and Log2.txt, similar to the following:
The Python code is implemented as follows:
#-*-Coding:utf-8-*-#!/usr/bin/python# filename:picktools.py# codedtime:2015-3-25import osimport configparser# Traverse a directory, output all file names def itemsbrowse (path): For home, dirs, files in Os.walk (path): for filename in Files:yield Os.path . Join (home, filename) # to find the corresponding string in the file in the line def findchars (filename, chars): File = open (filename, ' R ') for Eachline in File:if Eachline.find (chars) >= 0:yield eachline file.close () # Add to the specified file def addtofile (filename, mygenerator): FILE = open (filename, ' a ') # Append mode opens for line in Mygenerator:file.write (line) File.close () # Filter Duplicate word lines def filter (fi Lename): MyList = [] File = open (filename, ' R ') for Eachline in File:mylist.append (Eachline.strip ()) File.close () File2 = open (os.path.splitext (filename) [0] + ' _filter.txt ', ' W ') for line in list (set (MyList)): Print (line, file = file2) #file2. Write (line) file2.close () def excute (): iniconf = Configparser. Configparser () iniconf.read (' config.ini ') ifile = iniconf.get (' Setting ', ' ifilepath ') ofile = iniconf.get (' Setting ', ' ofilepath ') chars = iniconf.get (' Setting ', ' Searchstr ') for Fullna Me in Itemsbrowse (ifile): Mygenerator = Findchars (fullname, chars) addtofile (ofile, Mygenerator) filter (ofile) if __name__ = = ' __main__ ': Excute ()
Output: Output of two files Result.txt and Result_filter.txt
Experience:
1, the use of Python to deal with some of the daily small tasks, can be very convenient to complete, compared to C + +, this aspect of productivity is much higher.
2, the design of the Chinese characters in the processing, so the use of regular expression is not very convenient, but many can not, the subsequent version will be added to the regular support!
3, because of beginners, so the code is not refined concise, follow-up optimization!