Ways to write scripts that extract Chinese in a log using Python

Source: Internet
Author: User
Tags log log
Because the work needs to be in a large number of logs to extract some of the corresponding fixed characters, if simply by manual extraction, data volume, labor, and then naturally think of using Python to do a corresponding extraction tool, instead of manual extraction of the complex, involving Chinese characters, regular expression bad match, but not can not be achieved, After this optimization is again.

Requirements Description:

There are multiple subfolders in a parent directory, with multiple TXT-formalized log logs in the subfolder, requiring that the value of Cardtype=9, cardno=0, cardid be found from all log logs and stored in a text file, requiring Cardid not to be duplicated.

Requirements Analysis:

The full path of all log logs is obtained first, loaded separately according to the path to load each log log into memory for extraction analysis, and the result is stored in the given text file.

Solution:

To be as concise as possible, the configuration file is used as the basis for the input variables. Not much to say, on the code:

The configuration file is as follows:

103 folders have two files: Log1.txt and Log2.txt, similar to the following:

The Python code is implemented as follows:

#-*-Coding:utf-8-*-#!/usr/bin/python# filename:picktools.py# codedtime:2015-3-25import osimport configparser# Traverse a directory, output all file names def itemsbrowse (path): For home, dirs, files in Os.walk (path): for filename in Files:yield Os.path . Join (home, filename) # to find the corresponding string in the file in the line def findchars (filename, chars): File = open (filename, ' R ') for Eachline in   File:if Eachline.find (chars) >= 0:yield eachline file.close () # Add to the specified file def addtofile (filename, mygenerator): FILE = open (filename, ' a ') # Append mode opens for line in Mygenerator:file.write (line) File.close () # Filter Duplicate word lines def filter (fi    Lename): MyList = [] File = open (filename, ' R ') for Eachline in File:mylist.append (Eachline.strip ()) File.close () File2 = open (os.path.splitext (filename) [0] + ' _filter.txt ', ' W ') for line in list (set (MyList)): Print (line, file = file2) #file2. Write (line) file2.close () def excute (): iniconf = Configparser. Configparser () iniconf.read (' config.ini ') ifile = iniconf.get (' Setting ', ' ifilepath ') ofile = iniconf.get (' Setting ', ' ofilepath ') chars = iniconf.get (' Setting ', ' Searchstr ') for Fullna            Me in Itemsbrowse (ifile): Mygenerator = Findchars (fullname, chars) addtofile (ofile, Mygenerator) filter (ofile) if __name__ = = ' __main__ ': Excute ()


Output: Output of two files Result.txt and Result_filter.txt

Experience:

1, the use of Python to deal with some of the daily small tasks, can be very convenient to complete, compared to C + +, this aspect of productivity is much higher.

2, the design of the Chinese characters in the processing, so the use of regular expression is not very convenient, but many can not, the subsequent version will be added to the regular support!

3, because of beginners, so the code is not refined concise, follow-up optimization!

  • Related Article

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.