Python Common Chinese Word tool __python

Source: Internet
Author: User

Recently in doing micro-blog text processing, in the choice of Word tools, respectively, selected: Jieba \ Nlpir LTP

These three kinds of participle tools, here to share the next ~


1.jieba participle:

#-*-coding:utf-8-*-import
os import
codecs
import jieba
seg_list = Jieba.cut (' Deng Chao, born in Nanchang, Jiangxi, 1979 Mainland Chinese actor, film director, investment producer, internet investor. ')

f1 = Codecs.open ("D2w_ltp.txt", "W")
print "/". Join (Seg_list) for

i in Seg_list:
    f1.write (I.encode ("Utf-8"))
    F1.write (str (""))

Effect:

Deng Chao/,/1979//born/Jiangxi/Nanchang/,/China/mainland/actor/, Film/director/,/investment/producer/,/Internet/investor/.

This includes stuttering participle and writing the form of a file

It is noteworthy that the stuttering word out of the character encoding is ' Unicode ' encoding, we need to put the Unicode-> utf-8


2. Zhang Huaping Teacher's Nlpir (Https://github.com/NLPIR-team/NLPIR)

Here give Zhang Huaping Teacher's GitHub address, need to use friends can go to the teacher's git to get licence there are two kinds: 10 days of one months of

Of course, detailed code and installation package I also uploaded to the csdn above, interested friends can see (or need to update licence)


It is worth mentioning that most of the domestic papers are using this word tool, more authoritative

r = Open (' Text_no_seg.txt ', ' R ')
List_senten = []
sentence = ' Deng Chao, 1979 born in Nanchang, Jiangxi province, mainland China actor, film director, investment producer, internet investor. ' For
i in seg (sentence):
   list_senten.append (i[0])

print "/". Join (List_senten)

f1 = Codecs.open ("d2w _ltp.txt ", W") for
I-in seg (sentence):
   f1.write (i[0])
   f1.write (str (""))

Effect:

Deng Chao/,/1979/born/In/Jiangxi/Nanchang/,/China/mainland/male/actress//Film/director/,/investment/production/person/,/Internet/investor/.

Of course Nlpir has a good effect on named entity recognition:

Deng Chao nr
, wd
1979 T
born VI
in P
Jiangxi NS
Nanchang NS
, WD
China NS
mainland s
male b
actor n
  , WN
film N
director N
, WN
investment n
produced vi
people n
, wn
internet n
investor n
. Wj

3. LTP
#-*-Coding:utf-8-*-
import os
import codecs from

PYLTP import segmentor
#分词
def segmentor ( Sentence):
    segmentor = Segmentor ()  # Initialize instance
    segmentor.load (' Ltp_data/cws.model ')  # load model
    words = Segmentor.segment (sentence)  # participle
    words_list = list (words)
    segmentor.release ()  # release Model
    return words_list

f1 = Codecs.open ("D2w_ltp.txt", "w")
sentence = ' Deng Chao, born in Nanchang, Jiangxi Province in 1979, China's mainland actor, film director, investment producer, Internet investor. '
print '/'. Join (Segmentor (sentence)) for

I-segmentor (sentence):
    f1.write (i)
    f1.write (str (" "))

Effect:
Deng//,/1979 Year/born/In/Jiangxi/Nanchang/,/China/mainland/male/actress/,/Film/director/,/investment/producer/,/Internet/investor/.
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.