[Search] baud stem (Porter streamming) extraction algorithm detailed (3)

Source: Internet
Author: User



Pick up

[Search] baud stem (Porter streamming) extraction algorithm detailed (2)

The following is a 5-step process for stemming using the previously mentioned substitution criteria.

The left is the rule, and the right is an example of success or failure (denoted in lowercase letters).

Step 1

SSEs, ss                    caresses ->  Caress
ies -i                           Ponies    ->  Poni
                                         ties     ->  ti

SS---SS Caress-Caress
Cat, Cats, S

(m>0) EED-EE Feed
Agreed-Agree

  (*v*) ed ->                     plastered->  Plaster
                                         bled      ->  Bled
  (*v*) ING->                    motoring ->  Motor
                                         sing      ->  Sing

At, ATE Conflat (ed), conflate

BL, BLE Troubl (ed), Trouble
IZ, IZE siz (ed), size
(*d and not (*l or *s or *z))
-Letter
Hopp (ing) hop
Tann (ed), Tan
Fall (ing), fall
Hiss (ing), hiss
Fizz (ed), fizz
(M=1 and *o), E fail (ing), fail
Fil (ing), file

(*v*) Y-I Happy-Happi
Sky-Sky
With the processing of step 1, the plural and the past participle are processed.

Step 2

(m>0) Ational, ATE Relational, relate
(m>0) tional, tion Conditional, condition
Rational and rational
(m>0) Enci, ENCE Valenci, Valence
(m>0) Anci, ance Hesitanci, hesitance
(m>0) IZER, IZE Digitizer, digitize
(m>0) Abli, ABLE Conformabli, conformable
(m>0) ALLI, AL Radicalli, radical
(m>0) Entli, ENT Differentli, different
(m>0) ELI-E Vileli-> Vile
(m>0) Ousli, OUS Analogousli, analogous
(m>0) ization, IZE Vietnamization, vietnamize
(m>0) ATION, ATE predication, predicate
(m>0) Ator, ATE operator, operate
(m>0) Alism, AL feudalism, feudal
(m>0) Iveness, IVE decisiveness-decisive
(m>0) Fulness, FUL hopefulness, hopeful
(m>0) Ousness, OUS callousness, callous
(m>0) Aliti, AL Formaliti, formal
(m>0) Iviti, IVE Sensitiviti-sensitive
(m>0) Biliti, BLE Sensibiliti-Sensible
Step 3

(m>0) Icate, IC triplicate, Triplic
(m>0) Form ative, Formative,
(m>0) ALIZE, AL formalize, formal
(m>0) Iciti, IC Electriciti, Electric
(m>0) Electric, electrical IC, ICAL
(m>0) FUL, hopeful, hope
(m>0) Good, goodness, NESS

Step 4

(m>1) Reviv, Revival, AL--
(m>1) ance, allowance, allow
(m>1) ENCE, Inference, infer
(m>1) Airlin, airliner, ER-
(m>1) Gyroscop, gyroscopic, IC
(m>1) ABLE-Adjustable, adjust
(m>1) Ible, defensible, Defens
(m>1) Irrit, ANT-irritant
(m>1) Ement, replacement, Replac
(m>1) ment-adjustment, adjust
(m>1) Depend, dependent, ENT
(M>1 and (*s or *t)) Adopt, adoption, ION
(m>1) homolog, Homologou, OU
(m>1) Commun, communism, ISM
(m>1) Activ, activate, ATE
(m>1) Angular, Angulariti, ITI
(m>1) OUS, homologous, homolog
(m>1) effect, effective, IVE
(m>1) IZE, Bowdlerize, Bowdler

With the previous four steps, the suffix is removed, leaving the last step to do some fine-tuning.

Step 5

     (m>1) e    ->                   probate       - >  Probat
                                      rate          ->  Rate
     (M=1 and not *o) E->           cease          ->  CEAs

(M > 1 and *d and *l)
Control Controll
Roll


Some people specialize in the evaluation of the Porter algorithm, found that stemming can significantly improve the recall rate, and light extraction has little effect on the accuracy, but the depth of extraction will seriously affect the accuracy rate, so they recommend the first use of light extraction, if the query results are too small to use deep extraction.

[Search] baud stem (Porter streamming) extraction algorithm detailed (3)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.