Python read text, output specified Chinese (string)

Source: Internet
Author: User

Because of business requirements, you need to extract each line of text with the check typeface.

The sample is as follows:

1 input 10kVB, c female segment 820 latching prepared self-cast platen   2 exit 10kVB, c female segment 820 standby jump 803 platen   3 exit 10kVB, c female segment 820 prepare appeal 820 platen   4 Check 2, 3rd main transformer Split position consistent   5 closed 820 circuit Breaker   6 Check 820 circuit breaker with load 7 check 2nd, 3rd main transformer load distribution normal 8 open 802 circuit Breaker 9   

We're going to use the package: Re (python's powerful regular package), codecs (specifically used as the encoding conversion)

Idea: A way of thinking is to use the regular to find the check, according to the sample can write: \d{1,2}\s inspection, belongs to a once and for all. There is a comparison of the following naked writing, first using the Python ReadLines () method to read each line of text, and then you ' check ' to determine each row with ' check ', this method is more troublesome is also need to remove the number in the sample, Def func () is to solve the problem. As for why I want to use the second kind, I do not know. :)

The code is as follows:

ImportReImportCodecsf= Codecs.open (‘F:/parseword/tmp/f1040ez.content.txt‘,‘R‘,‘Utf-8‘) s =F.readlines () F.flush () F.close ()For FilelineInchSif u‘Check‘InchFileline:line_pattern =r‘\s*\d+\s? (.*)‘ def func (text): C = Re.compile (line_pattern) lists = [] lines = Text.split ( " Span style= "COLOR: #800000" >\n ") for line in Lines:r = C.findall (line) if r: Lists.append (r[0]) return  " \n ".join (lists) result =< Span style= "COLOR: #000000" > func (fileline) print result /span>                

Results:

>>> ================================ RESTART ================================>>> Check 2, No. 3rd main transformer split position uniform   Check 820 circuit breaker with load   check 2nd, 3rd main transformer load distribution normal   Check 802 circuit breaker in the gate position   Check No. 3rd main transformer but load   

Of course we can also create a new list, using the Append method to combine the results of the FOR loop:

Test = [] "" "test.append (Result) print test    

Python read text, output specified Chinese (string)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.