Python: a simple text string processing method.

Source: Internet
Author: User

Python: a simple text string processing method.

This example describes how to implement simple text string processing in Python. We will share this with you for your reference. The details are as follows:

For a text string, you can use the Pythonstring.split()Method to cut it. Next let's take a look at the actual running effect.

mySent = 'This book is the best book on python!'print mySent.split()

Output:

['This', 'book', 'is', 'the', 'best', 'book', 'on', 'python!']

As you can see, the splitting effect is good, But punctuation marks are also treated as words and can be processed using regular expressions. The separator is any character string except words and numbers.

import rereg = re.compile('\\W*')mySent = 'This book is the best book on python!'listof = reg.split(mySent)print listof

Output:

['This', 'book', 'is', 'the', 'best', 'book', 'on', 'python', '']

Now we get a vocabulary composed of a series of words, but the empty strings in it need to be removed.

You can calculate the length of each string and only return strings greater than 0.

import rereg = re.compile('\\W*')mySent = 'This book is the best book on python!'listof = reg.split(mySent)new_list = [tok for tok in listof if len(tok)>0]print new_list

Output:

['This', 'book', 'is', 'the', 'best', 'book', 'on', 'python']

Finally, we found that the first letter in the sentence was in uppercase. We need to convert uppercase to lowercase in the same format. Python embedded method, which can convert all strings to lowercase letters (.lower()) Or capital (.upper())

import rereg = re.compile('\\W*')mySent = 'This book is the best book on python!'listof = reg.split(mySent)new_list = [tok.lower() for tok in listof if len(tok)>0]print new_list

Output:

['this', 'book', 'is', 'the', 'best', 'book', 'on', 'python']

Here is a complete Email:

Content

Hi Peter,With Jose out of town, do you want tomeet once in a while to keep thingsgoing and do some interesting stuff?Let me knowEugene
import rereg = re.compile('\\W*')email = open('email.txt').read()list = reg.split(email)new_txt = [tok.lower() for tok in list if len(tok)>0]print new_txt

Output:
Copy codeThe Code is as follows: ['hi', 'Peter ', 'with', 'job', 'out', 'of', 'town', 'Do', 'you ', 'want', 'to', 'meet ', 'Once', 'in', 'A', 'while ', 'to', 'keep', 'things ', 'going', 'and', 'Do ', 'some', 'Interesting', 'stuff', 'let', 'me', 'know', 'eugene ']

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.