PYTHON+NLTK Natural Language learning process five: Dictionary resources

Source: Internet
Author: User
Tags name database nltk

Many of the dictionary resources that are carried in the NLTK are described earlier, and these dictionaries are useful for working with text, such as implementing a function that looks for a word that consists of several letters of EGIVRONL. And the number of words each letter should not exceed the number of letters in egivronl, each word length is greater than 6.

To implement such a function, we first call the freqdist function. To get the number of occurrences of each letter in the sample letter.

PUZZLE_LETTERS=NLTK. Freqdist (' egivrvonl ')

For K in Puzzle_letters:

Print (K,puzzle_letters[k])

The results are as follows: It can be seen that puzzle_letters is actually an iterative object, and is in the form of a dictionary,thekey value is the letter,item The number of times a letter appears

E 1

G 1

I 1

V 2

R 1

o 1

N 1

L 1

So can we compare the two-word letters with freqdist to see the following example:

Compare two freqdist Objects

Print (NLTK. Freqdist (' eg ') <=puzzle_letters)

Print (NLTK. Freqdist (' ae ') <= puzzle_letters)

Run Result: if puzzle_letters contains the object that was previously compared, it will return true, such as if eg is contained in the ' Egivrvonl ', while ae Although e is contained in ' egivrvonl ', but a does not exist, so returns  False.

True

False

The function of freqdist is introduced , so we have a general idea of how to implement this function. We created two freqdist objects, one of which was made up of egivronl. Which consists of words in nltk.corpus.words.words () that compare two objects to the words that are satisfied

PUZZLE_LETTERS=NLTK. Freqdist (' egivrvonl ')

Obligatory= ' R '

Wordlist=nltk.corpus.words.words ()

Ret=[w for W in Wordlist If Len (w) >=6 and obligatory in W and NLTK. Freqdist (W) <= Puzzle_letters]

Print (ret)

Obligatory represents the word must contain R, and then through W for W in wordlist if Len (w) >=6 and obligatory in W and NLTK. Freqdist (W) <= puzzle_letters to get the word that satisfies the condition:1 length greater than 6 2 R is included in the word 3 W Words in words come from ' egivrvonl '

The results are as follows:

[' Glover ', ' Gorlin ', ' govern ', ' grovel ', ' ignore ', ' involver ', ' lienor ', ' linger ', ' longer ', ' lovering ', ' noiler ', ' Overl ing ', ' region ', ' renvoi ', ' revolving ', ' ringle ', ' roving ', ' violer ', ' virole ']

This function is similar to a word puzzle game , which can be easily obtained by using the function and dictionary resources in NLTK.

Let's look at another feature to find the names that men and women share. That is, men can be used, women can also use, from the name of a gender cannot be distinguished from the name.

In NLTK , there is a name database with two files that store both male and female names. The code is as follows:

Name=nltk.corpus.names

Print (Name.fileids ())

Male_name=name.words (' Male.txt ')

Female_name=name.words (' Female.txt ')

Print ([w for W in Male_name if w in Female_name])

The results of the operation are as follows:

[' Female.txt ', ' male.txt ']

[' Abbey ',  ' Abbie ',  ' Abby ',  ' Addie ',  ' Adrian ', ',  ', ' Adrien ',  ' Ajay ',  ' Alex ',   ' Alexis ',  ' Alfie ',  ' Ali ',  ' Alix ',  ' Allie ',  ' Allyn ',  ', "Andie ',  ' Andrea ' ,  ' Andy ',  ' Angel ',  ' Angie ',  ' Ariel ', ',  ' Ashley ',  ' Aubrey ',  ' Augustine ',  ' Austin ',  ' Averil ',  ' Barrie ',  ' Barry ',  ' Beau ',  ' Bennie ', ',  ' Benny ',  ' Bernie ',   ' Bert ',  ' Bertie ',  ' Bill ',  ' Billie ',  ' Billy ',  ' Blair ',  ' Blake ',  ' Bo ',   ' Bobbie ',  ' Bobby ',  ' Brandy ',  ' Brett ',  ' Britt ',  ' Brook ',  ' Brooke ',  ' Brooks ',  ' Bryn ',  ' Cal ',  ' Cam ',  ' Cammy ',  ' Carey ',  ' Carlie ' ',  ' Carlin ',  ' Carmine ',  ' Carroll ',  ' Cary ',  ' Caryl ',  ', ' Casey ',  ' Cass ',  ' Cat ',  ' Cecil ',   ' Chad ',  ' Chris ',  ' Chrissy ',  ' Christian ',  ' Christie ',  ' Christy ',  ' Clair ',   ' Claire ',  ' ClIs ',  ' Claude ',  ' Clem ',  ' Clemmie ',  ' "Cody ',  ' Connie '",  ' Constantine ',  ' Corey ' ,  ' Corrie ',  ' Cory ',  ' Courtney ",  ' Cris ',  ' Daffy '",  ' Dale ',  ' Dallas ',  ' Dana ',  ' Dani ',  ' Daniel ',  ' Dannie ',  ' Danny ',  ' Darby ',  ' Darcy ',  ' Darryl ',   ' Daryl ',  ' Deane ',  ' Del ',  ' Dell ',  ' demetris ',  ' Dennie ',  ' Denny ',  ' Devin ',  ' Devon ',  ' Dion ',  "Dionis ',  ' Dominique ',  ' Donnie '",  ' Donny ',  ' Dorian ' ,  ' Dory ',  ' Drew ',  ' Eddie ',  ' Eddy ',  ' Edie ',  ', ' Elisha ',  ' Emmy ',  ' Erin ',   ' Esme ',  ' Evelyn ',  ' Felice ',  ' Fran ',  ' Francis ',  ' Frank ',  ' Frankie ',  ' Franky ',  ' Fred ',  ' Freddie ',  ' Freddy ',  ', ' Gabriel ',  ' Gabriell ' ',  ' Gail ',  ' Gale ' ,  ' Gay ',  ' Gayle ',  ' Gene ',  ' George ',  ' Georgia ',  ' Georgie ',  ' Geri ',  ' Germaine ',  ' Gerri ', &NBSP; ' Gerry ',  ' Gill ',  ' Ginger ',  ' Glen ',  ' Glenn ',  ' Grace ',  ' Gretchen ',  ' Gus ',   ' Haleigh ',  ' Haley ',  ' Hannibal ',  ' Harley ',  ', Hazel ',  ' Heath ',  ' Henrie ',   ' Hilary ',  ' Hillary ',  ' Holly ',  ' Ike ',  ' Ikey ',  ' Ira ',  ' Isa ',  ' Isador ',   ' Isadore ',  ' Jackie ',  ' Jaime ',  ' Jamie ',  ' Jan ',  ' Jean ',  ' Jere ',  ' Jermaine ',  ' Jerrie ',  ' Jerry ',  ' Jess ',  ' ' Jesse ',  ' Jessie ',  ' Jo ',  ' Jodi ',   ' Jodie ',  ' Jody ',  ' Joey ',  ' Jordan ',  ' Juanita ',  ' Jude ',  ' Judith ',  ' Judy ',  ' Julie ',  ' Justin ',  ' Karel ',  ' Kellen ',  ' Kelley ',,  ' Kelly ',  ' Kelsey ',  ' Kerry ',  ' Kim ',  ' Kip ',  ' Kirby ',  ' Kit ',  ' Kris ',  ' Kyle ',  ' Lane ',  ' lanny ' ,  ' Lauren ',  ' Laurie ',  ' Lee ',  ' Leigh ',  ' Leland ',  ' Lesley ',  ' Leslie ',  ' Lin ',  ' Lind ',  ' LinDsay ',  ' Lindsey ',  ' Lindy ',  ' Lonnie ',  ', ' Loren ',  ' Lorne ', ',  ' Lorrie ',  ' Lou ',   ' Luce ',  ' Lyn ',  ' Lynn ',  ' Maddie ',  "Maddy ',  ' Marietta ',  ' Marion ',  ' Marlo ',  ' Martie ',  ' Marty ',  ' Mattie ',  ', ' Matty ',  ' maurise ',  ' Max ',  ' Maxie ',   ' Mead ',  ' Meade ',  ' Mel ',  ' Meredith ',  ' Merle ',  ' Merrill ',  ' Merry ',  ' Meryl ',  ' Michal ',  ' Michel ',  ' Michele ',  ', ' Mickie ',  ' Micky ',  ' Millicent ',  ' Morgan ',  ' Morlee ',  ' Muffin ',  ' Nat ',  ' Nichole ',  ' Nickie ',  ' Nicky ',,  ' Niki ',   ' Nikki ',  ' Noel ',  ' Ollie ',  ' Page ',  ' Paige ',  ' Pat ',  ' Patrice ',  ' Patsy ',   ' Pattie ',  ' Patty ',  ' Pen ',  ' Pennie ',  ' Penny ',  ' Perry ',  ' Phil ',  ' Pooh ',   ' Quentin ',  ' Quinn ',  ' Randi ',  ' Randie ',  ' Randy ',  ' Ray ',  ' Regan ',  ' Reggie ',  ' Rene ',  ' Rey ',  ' Ricki ',  ' Rickie ',  ' Ricky ',  ' Rikki ',  ' Robbie ',  ' Robin ',  ' Ronnie ',  ' Ronny ',  ' Rory ',  ' Ruby ',  ' Sal ',  ' Sam ',  ' Sammy ',  ' Sandy ',  ' Sascha ',  ' Sasha ',  ' Saundra ',  ' Sayre ',  ' Scotty ', ',  ' Sean ',  ' shaine ',  ' Shane ',  ' Shannon ',   ' Shaun ',  ' Shawn ',  ' Shay ',  ' Shayne ',  "Shea ',  ' Shelby ',  ' Shell ',  ' Shelley ',  ' Sibyl ',  ' Simone ',  ' Sonnie ',  ', ' Sonny ',  ' Stacy ',,  ' Sunny ',  ' Sydney ',   ' Tabbie ',  ' tabby ',  ' tallie ',  ' Tally ',  ' Tammie ',  ', ' Tammy ',  ' Tate ',  ' Ted ',  ' Teddie ',  ' Teddy ',  ' Terri ',,  ' Terry ',  ' Theo ',  ' Tim ',  ' Timmie ',  ' Timmy ',  ' Tobe ',  ' Tobie ',  ' Toby ',  ', ' Tommie ',  ' Tommy ',  ' Tony ',  ' Torey ',  ' Trace ',  ' Tracey ',  ' Tracie ',  ' Tracy ',  ' Val ',  ' Vale ',  ' Valentine ',  ' Van ',   ' Vin ',  ' Vinnie ',   ' Vinny ',  ' Virgie ',  ' Wallie ',  "Wallis ',  ' Wally ',  ' Whitney '",  ' Willi ',  ' Willie ' ,  ' Willy ',  ' Winnie ',  ' Winny ',  ' Wynn ']

Of course, if we want to add names, we can define our own files. Here's how:

corpus_root= '/home/zhf/word '

Wordlists=plaintextcorpusreader (Corpus_root, '. * ')

Print (Wordlists.fieldids ())

For W in wordlists.words (' filename '):

Print (W)

Vocabulary tools:

We often replace a word with synonyms in the text. This requires the use of WordNet to help achieve

From Nltk.corpus import WordNet as Wn

Lemma=wn.synsets (' motorcar ')

Print (lemma)

Run Result:motorcar only one possible meaning, is car, then car.n.01 is called synset or a synonym set. Here car refers to the specific name,n is the part of speech (noun), and The index of the collection

[Synset (' car.n.01 ')]

all synonyms in this synonym set can be obtained by wn.synset (' car.n.01 '). Lemma_names ().

[' Car ', ' auto ', ' Automobile ', ' machine ', ' motorcar ']

We can also get the definition of this synonym set and the use example

Wn.synset (' car.n.01 '). Definition ()

Wn.synset (' car.n.01 '). Examples ()

A motor vehicle with four wheels; Usually propelled by an internal combustion engine

[' He needs a car to get into work ']

In wordnet , synonyms are divided into upper and lower words. For example, in front of the car.n.01, the car has many brands. These brands are the next word in car.

Motocar=wn.synset (' car.n.01 ')

Types_of_motorcar=motocar.hyponyms ()

[Lemma.name () for Synset in Types_of_motorcar for lemma in Synset.lemmas ()]

You can see a variety of different car types and brands.

[' Ambulance ',  ' Beach_wagon ',  ' station_wagon ',  ' wagon ',,  ' estate_car ',  ' Beach_waggon ',   ' Station_waggon ',  ' waggon ',  ' bus ',  ' jalopy ',  ' heap ',  ' cab ',  ' hack ',  ' The taxi ',  ' taxicab ',  ' compact ',  ' Compact_car ',  ', ' convertible ',  ' Coupe ',  ' cruiser ',   ' Police_cruiser ',  ' patrol_car ',  ' police_car ',  ' prowl_car ',  ' squad_car ',  ' Electric ',  ' electric_automobile ',  ' Electric_car ',  "Gas_guzzler ',  ' hardtop ',  ' Hatchback ',  ' horseless_carriage ',  ' hot_rod ',  ' ' hot-rod ',  ' Jeep ',  ' Landrover ',  ' Limousine ',  ' limo ',  ' loaner ',  ' minicar ',  ', ' minivan ',  ' model_t ',  ' Pace_car ',  ' Racer ',  ' Race_car ',  ' racing_car ',  ' roadster ', ',  ' runabout ',  ', ' two-seater ',  ' sedan ' ,  ' saloon ',  ' sport_utility ',  ' sport_utility_vehicle ",  ' s.u.v. ',  ' SUV ',  ' sports_ Car ',  ' sport_car ',  'Stanley_steamer ',  ' stock_car ',  ' subcompact ',  "Subcompact_car ',  ' touring_car ',  ' Phaeton ',  ' tourer ',  ' used-car ',  ' Secondhand_car ']

The upper and lower words can be understood as is-a relations. belong to the relationship between the subordinate. Since this is the case, then we can determine whether there is a common upper term for multiple sets of synonyms, and if two synonyms share a specific upper term, then you can judge that they must have a certain connection. For example, the following code:

Right=wn.synset (' right_whale.n.01 ') #露脊鲸

Orca=wn.synset (' orca.n.01 ') # Killer whale

Minke=wn.synset (' minke_whale.n.01 ') #逆戟鲸

Print (right.lowest_common_hypernyms (minke))

Operation Result:

[Synset (' baleen_whale.n.01 ')]

This is the three different types of whales that find the common upper word of right and Minke by lowest_common_hypernyms, which is fin whales

PYTHON+NLTK Natural Language learning process five: Dictionary resources

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.