Mysql full-text search practice (2)

Source: Internet
Author: User

In the previous article, I analyzed a specific implementation idea, mainly using Chinese Word Segmentation and Unicode encoding to solve the problem that mysql full-text indexes cannot be used for Chinese search, this article describes how to implement Chinese word segmentation.

The first is to solve the Chinese word segmentation. The Chinese word segmentation is to split a Chinese sentence and get a single word, for example, the sentence "high-paying registered cost engineer". After word segmentation, you will get a single word "high-paying, looking for, and registering cost engineers. I have introduced many Chinese Word Segmentation technologies on the Internet and have good open-source products and service interfaces. However, PHP is the mainstream development language used in combination with our products. Therefore, we prefer to select an open-source Chinese Word Segmentation System scws that supports PHP module extension to facilitate direct integration into the project for calling.

For details about how to install scws and generate the php module, refer to the official documentation.

After scws is installed and the php module is generated, you can directly call the api provided by scws to perform Chinese word segmentation. The following is a method to use php to call scws for Chinese word segmentation, it can be used to test Chinese word segmentation.

 
 
  1. Function ch_word_segment ($ text ){
  2. $ So = scws_new (); // create and return a SimpledCWS class operation object
  3. $ So-> set_charset ('utf8'); // you can specify the character set of the Word Segmentation dictionary, rule set, and text string.
  4. $ So-> set_multi (SCWS_MULTI_SHORT | SCWS_MULTI_DUALITY); // you can specify whether to split the results returned by word splitting. Here, you can set the short words and binary words.
  5. $ So-> send_text ($ text); // send the text to be segmented
  6. $ Result = $ so-> get_result (); // returns the word segmentation result.
  7. $ First = true;
  8. Foreach ($ result as $ key => $ value ){
  9. If ($ first ){
  10. $ Word. = $ value;
  11. $ First = false;
  12. } Else {
  13. $ Word. = ''. $ value; // the return value is a string separated by spaces.
  14. }
  15. }
  16. Return $ word;
  17. }

If the installation is successful, run the following code and you will receive the output of "register a cost engineer with high salaries.

 
 
  1. Print (ch_word_segment ('register a cost engineer with high salaries '));

 

This article is from the "Changing Star" blog, please be sure to keep this source http://jabari.blog.51cto.com/5020854/948276

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.