A Chinese Word Segmentation search tool under asp.net, asp.net Word Segmentation

Source: Internet
Author: User

A Chinese Word Segmentation search tool under asp.net, asp.net Word Segmentation

Jieba is a retrieval library in python. Someone transplanted this library to the asp.net platform, which can completely replace the combination of lucene.net and pangu word segmentation.

I wrote this because I was asked during the interview yesterday about how to search for keywords on the website? I am talking about SQL fuzzy query, SQL statement optimization, and cache. I have been familiar with keyword word segmentation before, but there is no mature word segmentation search library on the. net platform. Unlike lucene in java, although it is also transplanted to. net, the update is slow. When I learned python, I noticed the python word segmentation search and the word cloud. I thought there was no python word segmentation retrieval library to be transplanted. net, check the python jieba library and try to port it!
Original article Introduction: jieba Chinese word segmentation. NET version: jieba. NET
On the. NET platform, the common word segmentation component is pangu, but it has not been updated for a long time. The most obvious thing is the built-in dictionary. The jieba dictionary has 0.5 million entries, while the pangu dictionary is 0.17 million, which will produce different word segmentation effects. In addition, for Unlogged words, jieba uses the HMM Model Based on the Chinese character tokenization capability and uses the Viterbi algorithm. The effect looks good.

Code address github: https://github.com/anderscui/jieba.NET
You can search and download the file directly in the nuget Package Manager of VS2013:

Some people in the comments said that it would be nice to tell the MIIT virgins about the installation of 24-port switches and other technical devices each month after their subordinate departments, I tested it myself:

Var segmenter = new JiebaSegmenter (); Console. writeLine ("original retrieval statement: after passing through the subordinate departments each month, the MIIT virgin officer shall personally inform 24-port switches and other technical devices for installation"); var segments1 = segmenter. cut ("the Ministry of Industry and Information Technology (MIIT) officer should personally explain the installation of 24 ports of switches and other technical devices through subordinate departments every month", cutAll: true); Console. writeLine ("[full mode]: {0}", string. join ("/", segments1); var segments2 = segmenter. cut. writeLine ("[exact mode]: {0}", string. join ("/", segments2); var segments3 = segmenter. cut. writeLine ("[New Word Recognition]: {0}", string. join ("/", segments3); var segments4 = segmenter. cutForSearch. writeLine ("[Search Engine mode]: {0}", string. join ("/", segments4); var segments5 = segmenter. cut. writeLine ("[eliminate ambiguity]: {0}", string. join ("/", segments5); Console. read ();

Running result:

Good, except for the full mode, the rest can satisfy the order we read.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.