Ktdictseg word segmentation component version 1.3 new feature list and download location

Source: Internet
Author: User
Ktdictseg word segmentation component version 1.3 new feature list and download location

1. Modify the dictionary format to speed up dictionary Loading
2. Added support for professional English terms, such as C ++ and C #, which can be separated by dictionary.
3. added the word frequency judgment function, which can be selected based on the Word Frequency when there is no choice
4. added the preferred word frequency option. This option dynamically determines the granularity of word segmentation. freqfirst must be enabled.
5. added the Chinese name suffix statistics and the ability to locate names based on the statistics
6. added the Chinese name and Unlogged-on Word Frequency Statistics function.
7. The automatic Dictionary Update function is added. autoinsertunknownwords and unknownwordsthreshold must be enabled for automatically inserting dictionary names and Unlogged words that exceed the threshold. (automatic insertion is not recommended. manual insertion is recommended)
8. autosaveinterval is required for regularly saving dictionaries and statistical results.
9. added the ktdictseg. xml configuration file to configure word segmentation parameters.
10. Added support for lucene.net and provided the ktdictseganalyzer to lucene.net.
11. added the dictionary management function to add, delete, and modify dictionaries.
12. The batch dictionary insertion function is provided in dictionary management, which can help users manually select the appropriate Unlogged words to insert the dictionary (recommended)
13. A simple news search example is provided. The project name is demo. ktdictseganalyzer + ktdictseg.
14. Change all arraylist to list. <>

Dictionary management tool dictmanage.exe
Interface:

News search example: demo.ktdictseganalyzer.exe
Interface

download location
domestic download location
where
src_v1.3.01 is the source code
rel_v1.3.01 contains all executable files and configuration files; the data directory contains a word dictionary, a disabled word table, and a suffix word table before the person's name that I currently count. The news directory contains indexes created by example e.net for
news search.
news.zip is an XML file to be entered during batch insertion. It contains 30 thousand outdated news articles captured from Sina and Zhonghua. It contains about 20 million words, it can be used by friends.
Note: If you want to import news.xml, this file must be in the same directory as demo.ktdictseganalyzer.exe!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.