This article describes how to enable MySQL full-text Indexing Based on the mysqlcft plug-in, because mysql currently does not provide satisfactory support for Chinese fulltext, let's take a look at how to use the mysqlcft plug-in to enable your mysql to support Chinese indexes.
Features of MySQL full-text index plug-in mysqlcft:
1. Advantages:
① High Accuracy: the "Three-byte crossover splitting algorithm" is used to separate Chinese statements without a Chinese Word Segmentation dictionary. The search accuracy is far higher than the Chinese word segmentation algorithm, LIKE '%... %.
② Fast query speed: search speed is 3 ~ Faster than LIKE '%... % ~ 50 times. There are test results at the end of the article;
③ Standard plug-in type: developed in the form of a standard plug-in for full-text index of MySQL 5.1, without modifying the MySQL source code or affecting other functions of MySQL, you can quickly follow up the new version of MySQL;
④ Support multiple versions: Support all MySQL 5.1 Release Candidate versions, that is, MySQL 5.1.22 RC ~ The latest MySQL 5.1.25 RC;
⑤ Supported character sets: MySQL character sets including GBK, GB2312, UTF-8, Latin1, BIG5 are supported (other character sets have not been tested );
6. Good system compatibility: i386 and x86_64 are available, and 32-bit (i386) and 64-bit (x86_64) CPUs and Linux systems are supported;
7. Suitable for distributed: It is very suitable for the MySQL Slave distributed system architecture, with no dictionary maintenance cost and no dictionary synchronization problem.
2. Disadvantages:
①. Mysqlcft full-text index is only applicable to MyISAM tables, because MySQL only supports FULLTEXT index for MyISAM tables;
② MySQL cannot be statically compiled and installed; otherwise, mysqlcft plug-in cannot be installed;
③ The index file based on the "Three-byte crossover splitting algorithm" is slightly larger than the index file based on the "Chinese word segmentation algorithm", such as the massive index files such as ft-hightman. According to my tests, the. MYI index file of mysqlcft full-text index is 2 ~ of the. MYD data file ~ 5 times.
Ii. Core Idea of mysqlcft-"Three-byte crossover splitting algorithm"
Note: This article takes 0 ~ 7 digit numbers represent "English", "Numbers", and "half Chinese characters" for illustration.
1. Split Chinese statements in three bytes to create a full-text index:
For example, the words "full-text index" or "one X-ray machine" will be split into six copies and reverse indexes will be created:
012 123 234 345 456 567
2. Split the search keywords by three bytes and find the corresponding information in the full-text index:
Example 1: Search for the keyword "wensuo", which is expressed as "2 ~ 5 ", it will be split:
234 345
In this way, it is matched with the full-text index.
Example 2: Search for the keyword "X-ray machine", which is "3 ~ 7 ", then it will be split:
345 456 567
In this way, it is also matched with the full-text index.
Example 3: Search for the keyword "1 Machine", which is "0 ~ 2 "and 4 ~ 7 ", then it will be split:
012 456 567
In this way, the multi-Keyword Search is also matched with the full-text index.
1 2 3 4 5