To work with a document, you need to divide the comment section of a TXT document (such as a user comment) into each line of comments before and after all of its punctuation. Such as:
Shuai Fu Ward, mobile phone animation. old Hu Sifan, is to the Russian side.
Processed into:
Shuai Fu Ward cell phone animation old Hu Sifan is on the Russian side
This makes it easier for Python to further manipulate the text. There are usually the following two types of treatments:
1 fromStringImportPunctuation2 ImportRe3lis=['Shuai Fu ward, mobile phone animation. ','old Hu Sifan, is to the Russian side. '] 4b=["/". join ([C forCinchXifC not inchPunctuation]) forXinchLis]5 Print(b)6 #[' Handsome/FU/care/illness/room/hand/Machine/motion/painting/. ', ' aged//Hu/Si/van/,/is/IS/is/is/is/IS. ']7c=[Re.sub (r'[{}]+'. Format (punctuation),'/', x) forXinchLis]8 Print(c)9 #[' Handsome ward/Mobile animation. ', ' The old Hu Sifan, is on the Russian side. ']
Obviously the first is to judge each word, so that it does not reach the expected, and the second one in the punctuation print
In fact, punctuation is an English punctuation library, then the Chinese also added in there is no problem.
1 ImportRe2lis=['Shuai Fu ward, mobile phone animation. ','old Hu Sifan, is to the Russian side. ']3e=[Re.sub (r'[{}]+'. Format (haha),'/', x) forXinchLis]4 Print(e)5 forIinche:6Ee=i.split ('/')7 Print(EE)
The result of this print is
[' Shuai Fu Ward ', ' mobile phone animation ', '] [' Old Hu Sifan ', ' is the Russian side ', ']
Python about punctuation and other substitutions in text