Official website: https://code.google.com/p/friso/News: Friso-1.6.0 released (the latest version of 2014.05.08) Open source, easy to use, very suitable for the research of word segmentation technology.
One. Friso Chinese word breaker
Friso is a high-performance Chinese word breaker developed using the C language and is implemented using popular MMSEG algorithms. Completely based on the modular design and implementation, can be easily implanted into other programs, such as: mysql,php. The UTF-8/GBK encoding is also supported for slicing.
Sam: The understanding about MMSSEG see: http://blog.csdn.net/hzhsan/article/details/25270519
" source code without modification can be compiled in a variety of platforms to use , loaded 200,000 of the entry, memory consumption stability of 14.5M. "
1. Current Highest version: Friso 1.6.0, with support for UTF-8/GBK encoded segmentation.
2. MMSEG four kinds of filtering algorithm, the accuracy of the word segmentation reached 98.41%, please refer to the original algorithm: http://technology.chtsai.org/mmseg/.
3. Supports custom word libraries. Under the Dict folder, you can easily add/delete/change thesaurus and thesaurus entries, and the Word library is categorized.
4. Simplified/traditional/simplified hybrid support, can be conveniently targeted for simplified, traditional or simple segmentation. At the same time can also be used to achieve simple and traditional mutual search.
5. Supports the recognition of mixed words in English and Chinese (Maintenance thesaurus can identify any combination). For example: Karaoke, beautiful mm, C language, IC card, Doraemon.
7. Good English support, English punctuation combination word recognition, such as C + +, C #, e-mail, URL, decimal, percent.
8. (! NEW) custom reserved punctuation: You can customize the punctuation that remains in the segmentation results, which can identify complex combinations such as C + +, k&r,code.google.com.
9. (! NEW) Two slices of complex English segmentation: The default Friso preserves the original combination of numbers and letters, enabling this feature to allow two slices to increase the hit rate of the search. For example: qq2013 will be cut into: qq/2013/qq2013.
10. Support for the recognition of Arabic/fractional basic word units, e.g. 2012, 1.75 m, 5 T, 120 kg, 38.6 ℃.
11. Auto English fillet/half angle, uppercase/lowercase conversion.
12. Synonym matching: Automatic Chinese/English synonyms append. (You need to turn on the Friso.add_syn option in Friso.ini).
13. Automatic stop word filtering in English and Chinese. (You need to turn on the FRISO.CLR_STW option in Friso.ini).
14. Multi-configuration support, secure application in multi-process/multi-threaded environment.
15. Provide Friso.ini profiles that make it easy to create word breakers for your app based on your needs.
Two. Participle speed
Test environment: 2.8ghz/2g/ubuntu
Simple Mode: 3.8m/sec
Complex Mode: 1.8m/sec
Three. Word breaker test:
1. Text 1:
Ambiguity and synonyms: Study the origin of life, mixed words: do b ultrasound body, X-ray nature is what, today go to the odd KTV karaoke karaoke go, Doraemon is a cartoon protagonist, unit and full-width: August 6, 2009 start the University tour, Yueyang today's temperature of 38.6 ℃, that is 101.48℉ , English numerals: bug report [email protected] or visithttp://code.google.com/p/jcseg, we all admire the hacker spirit! special number: ①⑩⑽㈩ .
Friso participle results:
Ambiguity and synonyms: Research pondering the origin of life, mixed words: To do b ultrasound body, X-ray essence is what, today go to the odd KTV karaoke karaoke, Doraemon is a cartoon protagonist, unit and full angle: 2009 August 6 start the University tour, Yueyang today's temperature of 38.6 ℃, that is, 101.48℉, English and English numerals: bug report chenxin 619315 gmail com [email protected] or Visit http://code Google COM code.google.com/p/jcseg, we all admire appreciate like love enjoy the hacker spirit Mind! Special numbers: ①⑩⑽㈩.
2. Text 2:
My uncle kissed my mother and kissed me.
Friso participle results:
My uncle kissed my mother and kissed me.
Four. How to use
Win under how to compile and install Friso?
For more information, please refer to the Friso Development Help documentation in the attachment.
1. Word Breaker interface Template:
For details, please refer to the tst-friso.c file in the source code:
friso_t Friso;friso_config_t config;friso_task_t Task;//1. Instantiate an instance of a Friso word breaker. Friso=friso_new();//2. Create a Friso participle configuration. Config=Friso_new_config();//3. Fast initialization of Friso in accordance with the given Friso.ini. if (Friso_init_from_ifile(Friso,Config,__path__) != 1 ) {printf("fail to initialize Friso and config."); GotoErr;}//4. Create a word breaker task:Task=Friso_new_task();//3. Set word breaker text for word breakers:Friso_set_text(Task, "text to be participle");//4. Participle Main program: while ( (Friso_next(Friso,Config,Task) ) !=NULL) { //printf ("%s[%d,%d]/", Task->hits->word, Task->hits->type, task->hits->offset);printf("%s/",Task -hits -Word);}//5. Releasing related resources:Friso_free_task(Task);Err:Friso_free_config(Config);Friso_free(Friso);