Collection for later use:
Transferred from: http://blog.csdn.net/askpp/archive/2009/09/08/4532355.aspx
custom loading of Cook looked through participle paoding dictionary
Everyone downloaded the cook looked through Chinese word after the words and after the MyEclipse configuration, and in the environment variables in the wingdows with the dic dictionary path, thinking how to load a custom dictionary, haha, actually very simple, I suddenly see, You go to the DiC folder to find paoding-dic-names.properties This file, open the content with a text editor like this
#dictionary character encoding
#paoding. Dic.charset=utf-8
#dictionaries which is skip
#paoding. dic.skip.prefix=x-
#chinese/CJK charactors that would not token
#paoding. Dic.noise-charactor=x-noise-charactor
#chinese/CJK words that would not token
Paoding.dic.noise-word=x-noise-word
#unit words, like "ge", "zhi", ...
#paoding. Dic.unit=x-unit
#like "Wang", "Zhang", ...
#paoding. Dic.confucian-family-name=x-confucian-family-name
#linke "Upan", "Cdhe"
#paoding. Dic.for-combinatorics=x-for-combinatorics
You add your own thesaurus to this, or you can save the # in front of the existing thesaurus, and then run the program to automatically detect it,
By the way, a few words in the library function, the front with X-word library is to block sensitive words with, ha ha, you will not want to put the word into the file inside it, haha, really happy.
Transferred from: http://hi.baidu.com/xwx520/blog/item/c288ee3eb0f5b9f0838b137f.html
Custom thesaurus for discovering participles [custom dictionaries]
Found a long time has not been updated, especially this module, it has not progressed long. Study like riding, behind. You should learn more when you are not in the same time.
First of all, it is first to post out the reference source, after all, not original.
(1), http://blog.csdn.net/askpp/archive/2009/09/08/4532355.aspx
(2), http://qipei.javaeye.com/blog/365207
Now continue:
1, to http://code.google.com/p/paoding/downloads/list download paoding-analysis-2.0.4-alpha2.zip
2, then unzip, find the DiC folder, copy to the folder you want to store
3, configuration environment variables, if not configured, the operation will be error, error of the Chinese information is also required to configure environment variables
4. Delete the. compiled file
5. Create a new text file with the suffix name. DiC, which is saved in the E:/paodingtest/dic/locale file directory using Utf-8.
6, below we write a word breaker test program
7, custom Thesaurus in the case of the word segmentation results, the first thing to see is the word database compilation information
8. Word segmentation results with custom Thesaurus
9. Delete Custom thesaurus and. compiled files, re-participle
10, put together a comparison, the effect is still some
11, if we are in the participle, we need to move, moving separately, by default is not separate
12, add two words in the thesaurus
13, of course, if you want to use this word breaker better, but also need to understand and think deeply about the word breaker, for example, "I am an athlete", although we have added the word "athlete" in the custom thesaurus, But still did not cut into "I", "yes", "athlete", but there is a irrelevant "mobilization" the word. While the "movement" should be divided into "transport", "move", "movement", that is also need to think about the use of, of course, it is also related to the particularity of Chinese language, for example: "Table tennis Auction is over", in the absence of context itself is ambiguous.