[Language Processing and Python] 11.4 use XML \ 11.5 use Toolbox data

Source: Internet
Author: User
Tags nltk

11.4 use Toolbox data

Use XML in the Language Structure

(2) <entry>

Role of XML

(For more basic XML knowledge, please query relevant information by yourself)

ElementTree Interface

>>>>>>merchant=>>><Element PLAYat 22fa800> >>><ElementTITLEat 22fa828> >>>>>><Element TITLEat 22fa828>, <Element PERSONAE at 22fa7b0>, <2300170><ElementPLAYSUBTat 2300198>, <ElementACTat 23001e8>, <ElementACTat 2><ElementACTat 23c87d8>, <ElementACTat 2439198>, <>]

We can use more methods to operate XML:

>>> i, act  enumerate(merchant.findall( j, scene  enumerate(act.findall( k,speechin enumerate(scene.findall( line  speech.findall(   %(i+1, j+1, k+1

We can also check the sequence of actors. We can use frequency distribution to see who can best say:

>>>speaker_seq = [s.text  s  merchant.findall(>>>speaker_freq =>>>top5 =speaker_freq.keys()[:5>>>, , , , ]

We can also view who follows the conversation.

>>>mapping= nltk.defaultdict(: >>> s = s[:4>>>speaker_seq2 = [mapping[s]  s >>>cfd =>>>cfd.tabulate()

Use ElementTree to access Toolbox data

We can use toolbox. xml () to access Toolbox files.

>>>>>>lexicon = toolbox.xml()

You can access the content in this way:

>>>lexicon[3<Element lx at 77bd28>>>>lexicon[3>>>lexicon[3

You can also use the path to access the XML content:

>>>[lexeme.text.lower()  lexeme  lexicon.findall(, , , , , , , , , , , ..., ]
>>>>>>>>>tree = ElementTree(lexicon[3>>><record><lx>kaa</lx><ps>N</ps><pt>MASC</pt><cl>isi</cl><ge>cookingbanana</ge><tkp>bananabilong kukim</tkp><pt>itoo</pt><sf>FLORA</sf><dt>12/Aug/2005</dt><ex>Taeaviiria kaaisi kovopaueva kaparapasia.</ex><xp>Taeavii bin planim gadenbanana bilongkukim tasol long paia.</xp><xe>Taeaviplantedbanana  orderto cookit.</xe></record>

Format entries

We can generate specific format output based on our own needs.

>>>html= >>> entry  lexicon[70:80= entry.findtext(= entry.findtext(= entry.findtext(+=%>>>html+=>>><table><tr><td>kakae</td><td>???</td><td>small</td></tr><tr><td>kakae</td><td>CLASS</td><td>child</td></tr><tr><td>kakaevira</td><td>ADV</td><td>small-like</td></tr><tr><td>kakapikoa</td><td>???</td><td>small</td></tr><tr><td>kakapikoto</td><td>N</td><td>newbornbaby</td></tr><tr><td>kakapu</td><td>V</td><td>placein sling  purposeof carrying</td></tr><tr><td>kakapua</td><td>N</td><td>slingfor lifting</td></tr><tr><td>kakara</td><td>N</td><td>armband</td></tr><tr><td>Kakarapaia</td><td>N</td><td>villagename</td></tr><tr><td>kakarau</td><td>N</td><td>frog</td></tr></table>

11.5 use Toolbox data

Add a field for each entry

Example 11-2 = re. sub (r, r = re. sub (r, r = re. sub (r, r field. tag === SubElement (entry, ==>> lexicon = toolbox. xml (>>> add_cv_field (lexicon [53 >>> nltk. to_sfm_string (lexicon [53103/Jun/2005

Verify Toolbox vocabulary

Many words in Toolbox format do not conform to any specific mode. Some entries may include additional fields or sort existing fields in a new way.

For example, with the help of FreqDist, we can easily find the sequence of fields with frequency exceptions:

>>>fd = nltk.FreqDist(.join(field.tag  field  entry)  entry >>>, 41),(, 37, 27), (, 20, 1)]

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.