Problem Description: In the discussion of the thesaurus, create an object called translate, which you can use to find the appropriate English vocabulary in German and Italian words. What might be the problem with this approach, can you come up with a way to avoid this problem?
The practice of the book is to use the entries () method to specify a list of languages to access the cognate word in multiple languages, and then convert it into a simple dictionary. The code is as follows:
1 fromNltk.corpusImportSwadesh2 Swadesh.fileids ()3It2en = Swadesh.entries (['it','en'])4De2en = Swadesh.entries (['de','en'])5Translate =dict (It2en)6 translate.update (Dict (de2en))7translate['Hund']
There is a problem with this approach, however, that there are many-to-many relationships in the original language list, as in It2en:
1(U'Tu, Lei', u'You (singular), thou')2(U'lui, Egli'+ R'He')3(U'Loro, Essi'+ R'they')4(U'qui, qua'+ R' Here')5(U'Udire, Sentire', u'Hear')6(U'Odorare, Annusare', u'Smell')7(U'Dividere, Separare', u'Split')8(U'Aguzzo, Affilato', u' Sharp')9(U'Asciutto, Secco', u'Dry')
When input translate[' tu '] does not display correctly you (singular), thou, but will error Keyerror: ' Tu ':
1 >>> translate['tu']2Traceback (most recent Call last):3 "<stdin>" in <module>4 'tu'
Solution Ideas:
Traversing the list of languages, when a many-to-many relationship is detected, the element is processed and then added to the original language list.
Code:
1 fromNltk.corpusImportSwadesh2 Swadesh.fileids ()3It2en = Swadesh.entries (['it','en'])4De2en = Swadesh.entries (['de','en'])5 6 forKeyinchIt2en:7 if ',' inchKey[0]:8Words = Key[0].split (', ')9 forEachwordinchwords:TenNewword = (Eachword, key[1]) One it2en.append (Newword) A - forKeyinchDe2en: - if ',' inchKey[0]: theWords = Key[0].split (', ') - forEachwordinchwords: -Newword = (Eachword, key[1]) - de2en.append (Newword) + -Translate =dict (It2en) +Translate.update (Dict (De2en))
"Python Natural Language Processing" chapter II Exercise solution exercises 6