python 使用spaCy 進行NLP處理

來源:互聯網
上載者:User

標籤:first   token   UNC   範圍   數字   big   使用   rtu   blog   

原文:http://mp.weixin.qq.com/s/sqa-Ca2oXhvcPHJKg9PuVg

import spacynlp = spacy.load("en_core_web_sm")doc = nlp("The big grey dog ate all of the chocalate,but fortunately he wasn‘t sick!")# 利用空格分開print(doc.text.split())# 利用token的.orth_方法,可以識別標點符號print([token.orth_ for token in doc])# 帶底線的方法返回字元、不帶底線的方法返回數字print([(token, token.orth_, token.orth) for token in doc])# 分詞,去除標點和空格print([token.orth_ for token in doc if not token.is_punct | token.is_space])# 標準化到基本形式practice = "practice practiced practicing"nlp_practice = nlp(practice)print([word.lemma_ for word in nlp_practice])# 詞性標註 可以使用.pos_ 和 .tag_方法訪問粗粒度POS標記和細粒度POS標記doc2 = nlp("Conor‘s dog‘s toy was hidden under the man‘s sofa in the woman‘s house")pos_tags = [(i, i.tag_) for i in doc2]print(pos_tags)# ‘s 的標籤被標記為 POS.可以利用這個標記提取所有者和他們擁有的東西owners_possessions = []for i in pos_tags:    if i[1] == "POS":        owner = i[0].nbor(-1)        possession = i[0].nbor(1)        owners_possessions.append((owner, possession))print(owners_possessions)# 簡化代碼print([(i[0].nbor(-1), i[0].nbor(1)) for i in pos_tags if i[1] == "POS"])# 實體識別 PERSON 是不言自明的;NORP是國籍或宗教團體;GGPE標識位置(城市、國家等等);DATE 標識特定的日期或日期範圍, ORDINAL標識一個表示某種類型的順序的單詞或數字。wiki_obama = """Barack Obama is an American politician who served as the 44th President of the United States from 2009 to 2017. He is the first African American to have served as president, as well as the first born outside the contiguous United States."""nlp_obama = nlp(wiki_obama)print([(i, i.label_, i.label) for i in nlp_obama.ents])# 將文章分成句子for ix, sent in enumerate(nlp_obama.sents,1):    print("Sentence number {}:{}".format(ix,sent))

 

python 使用spaCy 進行NLP處理

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.