Natural language 14_stemming words with NLTK

Source: Internet
Author: User
Tags nltk

https://www.pythonprogramming.net/stemming-nltk-tutorial/?completed=/stop-words-nltk-tutorial/

Stemming words with NLTK




The idea of stemming is a sort of normalizing method. Many variations of words carry the same meaning, other than when tense is involved.

The reason why we stem are to shorten the lookup, and normalize sentences.

Consider:

I was taking a ride in the car.
I was riding in the car.

This sentence means the same thing. The car is the same. I am is the same. The ing denotes a clear past-tense in both cases, so are it truly necessary to differentiate between ride and riding, in th E Case of just trying to figure out the meaning of what is this past-tense activity was?

No, not really.

This was just one minor example, but imagine every word in the Chinese language, every possible tense and affix you can put On a word. Have individual dictionary entries per version would be highly redundant and inefficient, especially since, once we conv ert to numbers, the "value" was going to be identical.

One of the most popular stemming algorithms are the Porter Stemmer, which has been around since 1979.

First, we ' re going to grab and define our Stemmer:

From nltk.  ImportPorterstemmerfrom nltk.  Import sent_tokenize,=porterstemmer()        

Now, let's choose some words with a similar stem, like:

=["Python","Pythoner","pythoning","pythoned","Pythonly"] 

Next, we can easily stem by doing something like:

For in example_words:print(PS.  Stem(w))          

Our output:

Pythonpythonpythonpythonpythonli

Now let's try stemming a typical sentence, rather than some words:

="It's important to by very pythonly and" is pythoning with Python. All pythoners has pythoned poorly at least once. "
= word_tokenize(new_text) for in words:print(PS.  Stem(w))               

Now we result is:

It is animport-to-Veripythonli while youarepython withpython.  Allpythonhavepythonpoorliatleastonc.            

Next up, we ' re going to discuss something a bit more advanced from the NLTK module, part of Speech tagging, where we can u Se the NLTK module to identify the parts of speech for each word in a sentence.

Natural language 14_stemming words with NLTK

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.