What are the recommended Python libraries for Chinese word segmentation, data mining, and AI for Python chatbots?

Source: Internet
Author: User
This article describes how to use a Python chatbot to recommend Python libraries or open-source projects for Chinese word segmentation, data mining, and AI. For more information, see Want to do https://www.php1.cn/wiki/1514.html "target =" _ blank "> what are the Python chatbots useful for Chinese word segmentation, data mining, AI Python library or open source project recommendations?

Accuracy Test (online test is provided using the corresponding project, and user-defined dictionary is not added)
Stuttering Chinese Word Segmentation: 209.222.69.242: 9000/
Chinese Emy of Sciences Word Segmentation System ictclas.org/ictclas_demo.html
Smallseg smallseg.appspot.com/smallseg
Snailseg snilsegdemo.appspot.com/
(The websites of the latter two need to go through the wall)

Test text 1
Once a month, the MIIT virgins will be reported to their respective departments to install 24 vswitches and other technical devices.

Test results:
Jieba Chinese Word Segmentation:
MIIT/n female officer/n monthly/r pass/p subordinate/v Department/n all/d/v user/n account/n 24/m Port/q switch/ n and others/u technical/n devices/n/uj installation/v Operation/vn
Chinese Emy of Sciences Word Segmentation System:
Work/n letter/n Virgin/n Officer/n monthly/r pass/p subordinate/v Department/n all/d to/v user/d account/v 24/n Port /q switch/n, etc./udeng technical/n device/n/ude1 installation/vn work/vn
Smallseg:

Each month, the Ministry of Industry and Information Technology (MIIT) female officers will be responsible for the installation of 24-port switches and other technical devices in subordinate departments.

Snailseg:

MIIT/female/Officer/monthly/post/subordinate/department/All/need/user/account/24/port/switch/etc/technical/device/installation/work
Bytes -----------------------------------------------------------------------------------------
Test text 2
The Ministry of Industry and Information Technology (MIIT) of the Ministry of Industry and Information Technology (MIIT)

Test results:
Jieba Chinese Word Segmentation:
Work/n emails/n locations/zg female/B users/zg events/n monthly/r Users/zg past/zg/m users/zg departments/n all/d requests /v parent/zg port/q description/n 24/m oral sex/n slave/zg machine/zg and so on/u Technology/ng slave/zg performance/ng device/n/ uj Security/v networking/zg work/vn
Chinese Emy of Sciences Word Segmentation System:
Engineer/n letter/n Department female/n department/r department/p department/n department/d department/v Department/d department/ v 24/n port/q ER/n, etc./udeng technical features/n device/n/ude1 security module/vn operation/vn
Smallseg:

Work/email/location/female/Workshop/event/monthly/post/department/All/need/kiss/mouth/account/24/mouth/hand/ hosts // machine/other/technology/Drivers/parts/security/installation/work

Snailseg:

Work/email/location/female/Workshop/event/monthly/post/department/All/need/kiss/mouth/account/24/mouth/hand/ hosts // machine/other/technology/Drivers/parts/security/installation/work

Bytes -----------------------------------------------------------------------------------------

Test Text 3
SCANV Website Security Center (scanv.com) is a comprehensive website security service platform. Through the website Security Center, you can easily check whether the website to be accessed has malicious behaviors, and report and expose illegal and malicious websites online in SCANV.

Test results:
Jieba Chinese Word Segmentation:
SCANV/eng URL/n Security/an center/n scanv/eng com/eng is/v A/m comprehensive/n/uj URL/n Security/an service platform/n /p URL/n Security/an center/n users/n can/c convenience/a/uj query/v to access/v/uj URL /n whether/v exists/v malicious/v behavior/v simultaneously/c can/c in/p SCANV/eng/f Online/B Report/v exposure/nz illegal/ malicious vn/v website/n
Chinese Emy of Sciences Word Segmentation System:
SCANV/x URL/n Security/an center/n (/wkz scanv.com/x) /wky is/vshi A/mq comprehensive/n/ude1 URL/n Security/an service platform/n. /Wj via/p URL/n Security/an center/n, /wd user/n can/v for convenience/a/ude1 query/vn to/v to access/v/ude1 URL/n whether/v exists/v malicious/ n behavior/n, /wd at the same time/c can/v in/p SCANV/x/f in/p line/n Report/vn exposure/vn illegal/vn malicious/n website/n. /Wj
Smallseg:

SCANV web site security center scanv.com is a comprehensive web site security service platform. Through the Web Site Security Center, users can conveniently query whether the website to be accessed has malicious behaviors and report online exposure violations in SCANV. malicious website

Snailseg:

SCANV/URL/security/center/scanv/com/Yes/A/Comprehensive/URL/security/service platform/via/URL/security/center/user/CAN/conveniently /query/to/access/URL/yes/no/existence/malicious/behavior/at the same time/In/SCANV/on/line/Report /exposure/illegal/malicious/website

Bytes -----------------------------------------------------------------------------------------

Test text 4
As page games flourish, the design of logical judgment based on archives is reduced, but this cannot be completely ignored. There will always be some features that require calling local archives. For example, in the login module, remember the password function and store the password information locally. For example, in the IE browser, go to C: \ Documents and Settings \ (your Windows user name) \ Application Data \ Macromedia \ Flash Player \ # export dobjects \ (random numbers and letters) \ folder to view the SOL file storing the password. You can use minerva to view the file, as shown in, if the password is stored in plaintext, the SOL file is permanently saved unless it is manually cleared. If a player logs on to the server in a public environment, the account is compromised.

Test results:
Jieba Chinese Word Segmentation:
With the rise of/p page/m game/n/v to/v now/uj page game/n flourishing/a depends on/v archive/v for/v logic/ n determine/v/uj design/vn reduce/v/ul but/c This/r also/d cannot/v complete/ad ignore/d drop/zg always/n some/v/m functions/n Yes/v need/v call/vn local/r archive/v/uj for example/v login/v module/n/f remember /v password/n function/n will/v will/d password/n information/n storage/j in/p local/r to/p IE/eng Browser/n/ p example/v in/p C/eng Documents ents/eng and/eng Settings/eng your/r/uj Windows/eng user name/n Application/eng Data/eng Macromedia/eng nbsp/eng Flash/eng Player/eng # export dobjects/eng some/m random/d numbers/n and/c letters/n Folder/n under/f on/d can/c view/v Storage/j password/n/uj SOL/eng file/n available/c Use/v minerva/eng tool/n View/v as shown below/t figure/n /v password/n plaintext/nr storage/j/uj SOL/eng file/n is/v permanent/nr storage/v/uj unless/c manual/ n clear/v if/c player/n is in/p PUBLIC/B environment/n/f login/v,/d will/v have/v steal number/n threat/vn from/v http/eng www/eng baidu/eng com/eng and/c http/eng www/eng baidu/eng com/eng
Chinese Emy of Sciences Word Segmentation System:
With/p page/q youxing/n/vf to/v now/t/ude1 page/q game/v flourishing/, /wd depends on/v on/p archive/vi to perform/vx logic/n judgment/v/ude1 design/vn to reduce/v/y, /wd but/c This/rzv block/q also/d cannot/v completely/ad ignore/v drop/v. /Wj total/d/v has/vyou some/mq functions/n is/vshi needs/v call/v local/rzs archive/vi/ude1. /Wj such as/v login/v module/n/f,/wd remember/v password/n function/n, /wd stores/p password information/n in/p local/rzs,/wd uses/p IE/x Browser/n AS/p example/n, /wd in/p C: /x \/x Documents ents/x/w and/x/w Settings/x \/x (/wkz your/rr/ude1 Windows/x user/n name/q) /wky \/x Application/x/w Data/x \/x Macromedia/x &/x nbsp/x; /wf \/x Flash/x/w Player/x \/x #/x export dobjects/x \/x (/wkz some/mq random/B Numbers/n and/cc letter/n) /wky \/x/w Folder/under n/f/d/v can see/v Storage/vn password/n/ud E1 SOL/x file/n,/wd can/v Use/v minerva/x tool/n View/v, /wd:/v/vf/n/usuo/vg,/wd password/n Ming/ag civilization/n text/ng storage/v/ude1, /wd SOL/x file/n is/vshi permanent/n save/ude1/v,/wd unless/c manual/B clear/vn, /wd if/c play/v Home/n is in/p PUBLIC/B environment/n/f login/v, /wd/d will/v has/vyou steal/vg/n threat/vn. /Wj from/v http:/x // w www.baidu.com/x // w and/v http:/x,/wn www.baidu...com/x
Smallseg:

With the rise of web games, the increasing popularity of web games relies on archiving to reduce the number of logic judgment designs, but this part cannot be completely ignored. There are always some functions that need to be called for local archiving. For example in the login module, remember the password function will store the password information locally. Take the IE browser as an example in C \ Documents and Settings \ your Windows USERNAME \ Application Data \ Macromedia & nbsp; \ Flash Player \ # export dobjects \ some random numbers and letters \ folder, you can see the SOL file storing the password. You can use minerva tool to view the SOL file stored in plaintext as shown in the password is permanently saved unless manually cleared. If a player logs on to the system in a public environment, the account is compromised.

Snailseg:

With/page/youxing/up/now/page/GAME/flourishing/dependency/on/archiving/proceeding/logic/judgment/design/decrease// this/block/cannot/completely/ignore/drop/always/Some/functions/Yes/need/call/local/archived/For example/login/module/ in/remember/password/function/will/Put/password/information/store/locally/In/IE/Browser/example/In/C/Documents/and/ settings/Your/Windows/user name/Application/Data/Macromedia/nbsp/Flash/Player/# export dobjects/Some/random numbers/characters/AND/Letters/files/folders//You can/View/store/password/SOL/file/you can/use/minerva/tool/View/as shown below/figure/show/password/plaintext /storage/SOL/file/Yes/permanent/save/unless/manual/clear/If/player/Under/public/environment/login/will//With/steal/No./threat/come/self/http/www/baidu/com/AND/http/www/baidu/com


Conclusion: The test results show that the performance test of word segmentation speed and text length exceeds a certain length is not carried out. The custom dictionary is also a major factor that affects word segmentation and does not cover the test, the above two points are excluded. On the whole, the Chinese Emy of Sciences's word segmentation system is the best for traditional Chinese characters and website word segmentation. If you talk about text that is easy to be ambiguous, jieba is good, the two are also more functional. I feel that if python is used for word segmentation, it is recommended to use jieba or the Chinese Emy of Sciences to call C library. If you are worried about calling C Library and other problems, you can use the jieba Word Segmentation System, which is a good choice, simple and traditional conversion before word segmentation; or the use of the Chinese Emy of Sciences word segmentation, coupled with a custom dictionary, is also a good choice, but I am calling the C library in python to use the Chinese Emy of Sciences word segmentation process, user-Defined dictionary import takes precedence (for example, user-defined dictionary import and Citic dictionary import. When the word segmentation content [people who believe in Buddhism in US] will be segmented into [US, CITIC, Yang, Buddhism, ), there are import failures, and function call security issues. You can select different word segmentation based on your needs. Try again when you are free!

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.