Chinese fuzzy search _ MySQL

Source: Internet
Author: User
What is fuzzy search? Why fuzzy search? I believe everyone knows this, so I will not proceed. Today, I will only talk about how to use fuzzy search. A LIKE. The well-known like statements are easy to use, compatible, and easy to maintain, but inefficient. We will all use it. we will not discuss it much. 2. what is fuzzy search? Why fuzzy search? I believe everyone knows this, so I will not proceed. Today, I will only talk about how to use fuzzy search.

A LIKE. The well-known like statements are easy to use, compatible, and easy to maintain, but inefficient. We will all use it. we will not discuss it much.

II. full text index supported by MYSQL native ).

Implementation Method: First, add an index to the Target Field. The index type is FULLTEXT. Then, use against () to specify the keyword after the where condition of the SQL statement.

Many articles on the Internet have many misunderstandings about this understanding. FULLTEXT does not support Chinese, or full-text indexes in Linux do not support Chinese. then it is enough to convert Chinese to pinyin. In fact, this is not the case. the key point is word segmentation. because there is no natural word segmentation in Chinese, unlike every word in English which is separated by spaces, the pinyin corresponding to each word is also separated by spaces like a word. Therefore, FULLTEXT does not support Chinese characters and converts Chinese characters to pinyin.

In fact, you can also convert Pinyin into Chinese characters. you can separate phrases by spaces according to word segmentation rules, or even simply and rudely separate each word by spaces, then, the text separated by spaces is stored in a specific field of the database, that is, a piece of information in the database needs to be saved with two fields, a piece of original text/a piece of text after word segmentation. Note: the FULLTEXT index must be set on the field after word segmentation.

Advantage: it is more efficient than like statements and is supported by MYSQL native.

Disadvantage: you need to maintain an additional field and separate it by yourself. How complicated is the use? See the following (several key points of FULLTEXT full-text index)

Third, use third-party components (Coreseek) sphinx, xunsearch ......

Implementation Method: import the fields to be searched in the data together with the Id to these third-party components. when searching, call the APIs provided by these third-party components to search for the returned Id, then query the database by Id.

Advantage: it is more efficient than the above two solutions and does not require word splitting.

Disadvantage: This third-party component needs to be maintained and updated at the same time each time the database is updated.

I chose solution 3 and handed over all maintenance and synchronization tasks to the scheduled tasks.

-- Highlights of FULLTEXT full-text index

1. the storage engine of the table needs to be MyISAM. I heard that MYSQL5.6 also supports full-text indexing;

2. field type: char, varchar, and text;

3. MySQL full-text index query keyword minimum length limit;

=> Ft_min_word_len. the default value is 4. it is recommended to change it to 1. Otherwise, the keyword corresponding to against () can only contain more than 4 characters, but cannot find a single character or a single Chinese character.

=> Add in my. ini configuration file

[Mysqld]

Ft_min_word_len = 1

=> After ft_min_word_len is set, restart the MySQL service and run show variables to check whether the ft_min_word_len setting is successful;

=> After the configuration is reset, you need to reset the index generated for the configured index. Otherwise, an error may occur,

I reported an error when updating some records: Incorrect key file for table './webm/temp. myi'; try to repair it.

Later, I executed mysql> repair table name;

5. match (index name). The match () parameter is the index name, not the field name;

=> The parameters in MATCH (title, content) must be exactly the same as those in FULLTEXT (title, content.

6. match (singername, songname). you can search for keywords in multiple index names at the same time;

7. if a keyword appears IN 50% of the data, it will be treated as an invalid word. you can use against ('keyword 'in boolean mode) to bypass the invalid setting;

8. if you search for multiple words, separate them with spaces or commas, as shown below:

=> SELECT * FROM 'temp 'where match ('char') AGAINST ('a X ');

=> SELECT * FROM 'temp 'where match ('char') AGAINST ('a, x ');

=> AGAINST ('1 keyword 2'), multiple keywords are separated by commas or spaces, and the or rule is used.

9. indexes will be reconstructed every time the table is updated. if full-text indexes are used for indexes, the insert and update operations will be slow;

10. search for syntax rules;

=> + Yes (all data entries without this keyword are ignored ).

=>-No. (exclude specified keywords. all keywords containing this keyword are ignored ).

=> "" Enclose a sentence with double quotation marks to indicate that the sentence must be completely consistent and cannot be split.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.