https://www.pythonprogramming.net/stemming-nltk-tutorial/?completed=/stop-words-nltk-tutorial/
Stemming words with NLTK
The idea of stemming is a sort of normalizing method. Many variations of words carry the same meaning, other than when tense is involved.
The reason why we stem are to shorten the lookup, and normalize sentences.
Consider:
I was taking a ride in the car.
I was riding in the car.
This sentence means the same thing. The car is the same. I am is the same. The ing denotes a clear past-tense in both cases, so are it truly necessary to differentiate between ride and riding, in th E Case of just trying to figure out the meaning of what is this past-tense activity was?
No, not really.
This was just one minor example, but imagine every word in the Chinese language, every possible tense and affix you can put On a word. Have individual dictionary entries per version would be highly redundant and inefficient, especially since, once we conv ert to numbers, the "value" was going to be identical.
One of the most popular stemming algorithms are the Porter Stemmer, which has been around since 1979.
First, we ' re going to grab and define our Stemmer:
From nltk. ImportPorterstemmerfrom nltk. Import sent_tokenize,=porterstemmer()
Now, let's choose some words with a similar stem, like:
=["Python","Pythoner","pythoning","pythoned","Pythonly"]
Next, we can easily stem by doing something like:
For in example_words:print(PS. Stem(w))
Our output:
Pythonpythonpythonpythonpythonli
Now let's try stemming a typical sentence, rather than some words:
="It's important to by very pythonly and" is pythoning with Python. All pythoners has pythoned poorly at least once. "
= word_tokenize(new_text) for in words:print(PS. Stem(w))
Now we result is:
It is animport-to-Veripythonli while youarepython withpython. Allpythonhavepythonpoorliatleastonc.
Next up, we ' re going to discuss something a bit more advanced from the NLTK module, part of Speech tagging, where we can u Se the NLTK module to identify the parts of speech for each word in a sentence.
Natural language 14_stemming words with NLTK