Because of business requirements, you need to extract each line of text with the check typeface.
The sample is as follows:
1 input 10kVB, c female segment 820 latching prepared self-cast platen 2 exit 10kVB, c female segment 820 standby jump 803 platen 3 exit 10kVB, c female segment 820 prepare appeal 820 platen 4 Check 2, 3rd main transformer Split position consistent 5 closed 820 circuit Breaker 6 Check 820 circuit breaker with load 7 check 2nd, 3rd main transformer load distribution normal 8 open 802 circuit Breaker 9
We're going to use the package: Re (python's powerful regular package), codecs (specifically used as the encoding conversion)
Idea: A way of thinking is to use the regular to find the check, according to the sample can write: \d{1,2}\s inspection, belongs to a once and for all. There is a comparison of the following naked writing, first using the Python ReadLines () method to read each line of text, and then you ' check ' to determine each row with ' check ', this method is more troublesome is also need to remove the number in the sample, Def func () is to solve the problem. As for why I want to use the second kind, I do not know. :)
The code is as follows:
ImportReImportCodecsf= Codecs.open (‘F:/parseword/tmp/f1040ez.content.txt‘,‘R‘,‘Utf-8‘) s =F.readlines () F.flush () F.close ()For FilelineInchSif u‘Check‘InchFileline:line_pattern =r‘\s*\d+\s? (.*)‘ def func (text): C = Re.compile (line_pattern) lists = [] lines = Text.split ( " Span style= "COLOR: #800000" >\n ") for line in Lines:r = C.findall (line) if r: Lists.append (r[0]) return " \n ".join (lists) result =< Span style= "COLOR: #000000" > func (fileline) print result /span>
Results:
>>> ================================ RESTART ================================>>> Check 2, No. 3rd main transformer split position uniform Check 820 circuit breaker with load check 2nd, 3rd main transformer load distribution normal Check 802 circuit breaker in the gate position Check No. 3rd main transformer but load
Of course we can also create a new list, using the Append method to combine the results of the FOR loop:
Test = [] "" "test.append (Result) print test
Python read text, output specified Chinese (string)