For a text string, you can use the Python String.Split () method to cut it. Let's look at the actual running effect.
' The best book on python! ' print mysent.split ()
Output:
[ " this ' , ' book ' , ' is ' , ' the ' , ' best ' , ' book ' , ' on ' , ' python! "]
As you can see, segmentation works fine, but punctuation is also used as a word and can be handled using regular expressions, where separators are any string except words and numbers.
Import= re.compile ('\\w*') the best book On python! ' = reg.split (mysent)print listof
The output is:
[ " this ' , ' book ' , ' is ' , ' the ' , ' best ' , ' book ' , ' on ' , ' python ' , ' ]
Now you get a list of words, but the empty strings inside are removed.
You can calculate the length of each string, returning only strings that are greater than 0.
Import= re.compile ('\\w*') the best book On python! ' =for inif len (tok) >0]print new_list
The output is:
[ " this ' , ' book ' , ' is ' , ' the ' , ' best ' , ' book ' , ' on ' , ' python ' ]
Finally, the first letter in the sentence is found to be capitalized. We need the same form to convert uppercase to lowercase. Python-embedded method to convert all strings to lowercase (. lower ()) or uppercase (. Upper ())
Import= re.compile ('\\w*') the best book On python! ' =for inif len (tok) >0]print new_list
The output is:
['this ' book'is' the ' ' Best ' ' Book ' ' on ' ' python ']
Let's look at a complete email:
Content
Hi Peter,
With Jose out of town, does you want to
Meet once in a and to keep things
Going and do some interesting stuff?
Let me know
Eugene
Import= re.compile ('\\w*'= open ('email.txt ' =for inif len (tok) >0]print new_txt
Output:
['Hi','Peter',' with','Jose',' out',' of',' Town',' Do',' You','want',' to','Meet','once','inch','a',' while',' to','Keep','things','going',' and',' Do','some','Interesting','Stuff',' Let','Me','know','Eugene']
Python for Text processing