What are the useful Chinese word breakers, data mining, Ai Python libraries, or open source project recommendations for Python chat bots?

Source: Internet
Author: User
want to do http://www.php.cn/wiki/1514.html "target=" _blank ">python chat robot What useful Chinese word segmentation, data Mining, AI aspects of the Python library or open source project recommendations?

Accuracy test (provide online testing using the corresponding project, no user-defined dictionary added)
Stuttering Chinese participle 209.222.69.242:9000/
Chinese Academy of Sciences participle system ictclas.org/ictclas_demo.html
Smallseg smallseg.appspot.com/smallseg
Snailseg snailsegdemo.appspot.com/
(after the two URLs need to turn over the wall)

Test Text 1
Work Letter Virgo officer every month through subordinate departments have to tell the 24-port switch and other technical device installation work

Test results:
Stuttering Chinese participle:

Chinese Academy of Sciences participle system:

SMALLSEG:

Work letter of the Virgin Officer monthly through subordinate departments have to tell the 24-port switch and other technical device installation work

SNAILSEG:

Office/woman/officer/month/pass/subordinate/department//To/From/to/From/24/port/switch/etc/technical/device/installation/work
-----------------------------------------------------------------------------------------
Test Text 2
Work in the Office of Women in the department every month through the departments have to be intimate account of 24 switch machine and other technical devices such as installation work

Test results:
Stuttering Chinese participle:
/uj/V Installation/zg work/VN
Chinese Academy of Sciences participle system:

SMALLSEG:

Work/letter/deal/female/STEM/Event/monthly/past/Next/Part/department/both/To/kiss/mouth/confessed/24/mouth/delivery/swap/////////////////////

SNAILSEG:

Work/letter/deal/female/STEM/Event/monthly/past/Next/Part/department/both/To/kiss/mouth/confessed/24/mouth/delivery/swap/////////////////////

-----------------------------------------------------------------------------------------

Test Text 3
SCANV Web Site Security Center (scanv.com) is a comprehensive web site security Service platform. Through the Web Site Security Center, users can easily query the URL to visit whether there is malicious behavior, but also in the SCANV online report exposure to illegal malicious sites.

Test results:
Stuttering Chinese participle:
Scanv/eng URL/n Security/an Center/n Scanv/eng Com/eng is/V A/m comprehensive/n/uj URL//Security/an Service Platform/n via/P URL//Security/an Center/n User/n can be convenient for/C//uj Query/V to/V to V/V to access the/uj URL/n whether/V exists/V malicious/V behavior/V at the same time/C can be/C in/P Scanv/eng/F online/b Report/V Exposure/nz illegal/vn malicious/V Web
Chinese Academy of Sciences participle system:
scanv/x URL/n Security/an Center/n (/wkz scanv.com/x)/wky is/vshi a/mq comprehensive/n/ude1 URL/n Security/an Service Platform/N. /WJ via/P URL//Security/an Center/N,/wd user/n can be/V convenient///for/UDE1 query/VN to/V to access/v the/ude1 URL/n or whether/V exists/V malicious/n behavior/N,/WD and/C can at/p scanv/x/F in the/P Line/N report/vn Exposure/VN illegal/VN malicious/n website/n. /wj
SMALLSEG:

SCANV Web Site Security Center scanv.com is a comprehensive web site security Service platform through the URL Security Center users can easily query to the URL to visit whether there is malicious behavior and can be in the SCANV online report exposure Illegal malicious website

SNAILSEG:

scanv/website/security/center/scanv/com/is/one/comprehensive//URL/Security/service Platform/via/URL/security/center/user/accessible/convenient/accessible/enquiry/to/To/access//URL/Yes/no/presence/ Malicious/behavioral/simultaneous/can/in/scanv//In/Line/report/exposure/illegal/malicious/website

-----------------------------------------------------------------------------------------

Test Text 4
As the page youxing up to the present, the design of the logical judgment that relies on the archive is reduced, but the block is not completely ignored. There are always some features that need to be called locally archived. For example, in the login module, remember password function, will store password information on-premises, IE browser for example, in C:\Documents and Settings\ (your Windows user name) \application Data\macromedia& Nbsp;\flash player\ #SharedObjects \ (some random numbers and letters) \ folder can see the storage password of the Sol file, you can use the Minerva tool to view, as shown in the password of civilized text stored, sol file is permanently saved, Unless manually cleared, if the player is logged in in a public environment, there will be a theft threat.

Test results:
Stuttering Chinese participle:
With the/P page/M tour N/a rise/V to/V now///uj page Tour/n flourish//dependent on/V archive/V///////////////uj Design/VN reduced/V/ul but/C this block/R also/d cannot be/ad ignore/d Always/n has/V some/m functions/n is a/V that requires/V to call/VN Local/R archive/V/uj such as/V login/V/////////////////////////////////To/ /eng browser/n for/P/V at/P C/eng Documents/eng And/eng Settings/eng your/R/uj Windows/eng username/n Application/eng data/eng Macrom Edia/eng Nbsp/eng Flash/eng Player/eng #SharedObjects/eng some/m random/D numbers/N and/C letters/N Folder////can see/C in/D to store/j password///uj S Ol/eng file/n You can use the/V Minerva/eng tool/N to view/V as shown in/T diagram///////password//Clear text/nr plaintext/nr store/j/uj Sol/eng file/n is/V permanent/NR save/V/uj unless/ Ww/eng Baidu/eng Com/eng
Chinese Academy of Sciences participle system:
With the/P page/q youxing/n from/VF to/V Now/T/ude1 page/q Tour///flourish/an,/wd dependent/V on/P archive/vi for/VX logic///V for/UDE1 design/VN reduced/v/y,/WD but/C this/rzv Block/q also/d cannot/V completely/ad ignore/V off/V. /WJ Total/d/V has/vyou some/mq function/n is/rzs that/vshi require/V to invoke/V local/VI archive/ude1. /WJ For example,/V Login/V module/F,/wd remember/V password///////////to store/V password information stores/n on/P local/rzs,/wd to/P ie/x browser/n for/p/N,/wd in/P c:/x \/x documents/x/w and/x/w settings/x \/x (/wkz you/rr/ude1 windows/x user/n name/q)/wky \/x application/x/w Data/x \/x Macromed ia/x &/x nbsp/x;/WF \/x flash/x/w player/x \/x #/x sharedobjects/x \/x (/wkz some/mq random/b digits/N and/cc letter/n)/wky \/x/w Folder//////////can see/V store/vn Password//////ude1 sol/x file/N,/WD can/V with/V minerva/x tool/N to view/V,/wd like/V under/VF Diagram/n/usuo show/VG,/WD Password/n Ming/ag Civilization/n/ng store/V/ude1,/wd sol/x file/n is/vshi permanent/n Save/V/ude1,/WD unless/C manual/b clears/VN,/wd if/C play/V home/N in/p public/b Environment/N/F Login/V,/WD/D will have/vyou theft/VG/n threat/vn. /WJ from/V http:/x//w//w www.baidu.com/x//w and/V http:/x,/wn www.baidu...com/x
SMALLSEG:

As the page youxing up to now the page is thriving depends on the archive to make logical judgments of the design is reduced but this block is not completely ignored there will always be some features that need to call local archives, such as the login module, remember that the password function would password information Stored locally in IE browser for example in C \ Documents and Settings \ your Windows user name \ application Data \ Macromedia & nbsp; \ Flash Player \ #SharedObjects \ Some random numbers and letters \ Folders you can see the Sol file where the password is stored can be viewed using the Minerva tool to view the sol text stored as shown in the password-civilized text The piece is permanently saved unless manually cleared if the player is logged in in a public environment, there will be a theft threat.

SNAILSEG:

with/page/youxing/play/present/page/tour/flourish/dependent/on/archive/conduct/logic/judgment/design/reduction///But/this/block/also/not/full/ignore/drop/always/have/some/function/yes/need/call/ Local/archived/For example/Login/module/IN/remember/password/function/will/will/password/info/store/in/local/with/ie/browser/AS/example/IN/c/documents/and/settings/you/Win dows/username/application/data/macromedia/nbsp/flash/player/#SharedObjects/some/random number/word/and/letter/File/clip/down/on/can/See/store/ Password/sol/file/can/use/minerva/tool/view/below/figure/the/Show/password/clear text/Clear text/Storage//sol/File/Yes/permanent/save///except/manual/Clear/If/player/in/public Total/environment/next/Login/ON/will/have/stolen/number/threat/To/From/http/www/baidu/com/and/http/www/baidu/com


Conclusion: From the overall test results, the word segmentation speed and text over a certain length of performance testing has not been carried out, the custom dictionary is also a great influence of the word segmentation factors, also did not cover the test, excluding the above two points, the overall view, the traditional and web address participle, the Chinese Academy of Sciences, the word segmentation system to do the best, , the stutter is good, and the two are also more relative to the functional aspects will be richer. Feel if the python participle, the proposed use of stuttering or Chinese Academy of Sciences participle call C library use, if you are worried about calling C library and other related problems, you can use the stuttering word system, is a good choice, in the word before the simple conversion, or the use of Chinese Academy of Chinese participle, plus a custom dictionary, is also a good choice, However, I am in Python call C library using the Chinese Academy of Sciences participle process, there is a user-defined dictionary import will be too preferential (such as import user dictionary, Citic, when the word [we believe that the Buddhist people] will participle [us, Citic, Yang, Buddhism, people]) and the existence of import failure situation, There are also function call security issues. The main is to choose different participle according to the need. Have time to perform performance tests!

  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.