Ktdictseg word segmentation component version 1.3 new feature list and download location
1. Modify the dictionary format to speed up dictionary Loading
2. Added support for professional English terms, such as C ++ and C #, which can be separated by dictionary.
3. added the word frequency judgment function, which can be selected based on the Word Frequency when there is no choice
4. added the preferred word frequency option. This option dynamically determines the granularity of word segmentation. freqfirst must be enabled.
5. added the Chinese name suffix statistics and the ability to locate names based on the statistics
6. added the Chinese name and Unlogged-on Word Frequency Statistics function.
7. The automatic Dictionary Update function is added. autoinsertunknownwords and unknownwordsthreshold must be enabled for automatically inserting dictionary names and Unlogged words that exceed the threshold. (automatic insertion is not recommended. manual insertion is recommended)
8. autosaveinterval is required for regularly saving dictionaries and statistical results.
9. added the ktdictseg. xml configuration file to configure word segmentation parameters.
10. Added support for lucene.net and provided the ktdictseganalyzer to lucene.net.
11. added the dictionary management function to add, delete, and modify dictionaries.
12. The batch dictionary insertion function is provided in dictionary management, which can help users manually select the appropriate Unlogged words to insert the dictionary (recommended)
13. A simple news search example is provided. The project name is demo. ktdictseganalyzer + ktdictseg.
14. Change all arraylist to list. <>
Dictionary management tool dictmanage.exe
Interface:
News search example: demo.ktdictseganalyzer.exe
Interface
download location
domestic download location
where
src_v1.3.01 is the source code
rel_v1.3.01 contains all executable files and configuration files; the data directory contains a word dictionary, a disabled word table, and a suffix word table before the person's name that I currently count. The news directory contains indexes created by example e.net for
news search.
news.zip is an XML file to be entered during batch insertion. It contains 30 thousand outdated news articles captured from Sina and Zhonghua. It contains about 20 million words, it can be used by friends.
Note: If you want to import news.xml, this file must be in the same directory as demo.ktdictseganalyzer.exe!