MySQL Chinese full-text indexing plug-in MYSQLCFT features:
1. Advantages:
①, precision is very high: the use of the "three-byte crossover algorithm", the Chinese sentence segmentation, no Chinese word segmentation, search accuracy is far higher than the Chinese word segmentation algorithm, can achieve like '%...% ' accuracy rate.
②, quick query speed: query speed than like '%...% ' search fast 3~50 times, the end of the article has test results;
③, Standard plug-in type: MySQL 5.1 full-text indexing standard plug-in form development, do not modify the MySQL source code, does not affect the other MySQL functions, can quickly follow the new version of MySQL;
④, support version more: Support all MySQL 5.1 Release Candidate version, that is, MySQL 5.1.22 rc~ the latest MySQL 5.1.25 RC;
⑤, support character set: Supports MySQL character set including GBK, GB2312, UTF-8, Latin1, BIG5 (other character sets have not been tested);
⑥, System compatible: With I386 and x86_64 two versions, support 32-bit (i386) and 64-bit (x86_64) CPU and Linux system;
⑦, suitable for distribution: very suitable for MySQL slave distributed system architecture, no word library maintenance costs, there is no thesaurus synchronization problem.
2. Disadvantages:
①, mysqlcft Chinese Full-text indexing only applies to MyISAM tables, because MySQL only supports fulltext indexing of myisam tables;
②, MySQL can not be statically compiled installation, otherwise can not install MYSQLCFT plug-ins;
③, based on "Three byte Cross segmentation algorithm" index file will be larger than the mass, Ft-hightman and other based on "Chinese word segmentation algorithm," the index file slightly bigger, but not much. Based on my tests, MYSQLCFT is full-text indexed. The myi index file is 2~5 times the. MyD data file.
second, the core idea of MYSQLCFT--"Three-byte cross-segmentation algorithm"
Note: This article takes the 0~7 numeral serial number to represent "English", "the numeral" and "half Chinese characters", in order to explain.
1, according to the three-byte Chinese sentence segmentation, the establishment of Full-text indexing:
For example, the word "Full-text index" or "1 X-ray machines" four will be split into 6, and the reverse index should be established:
012 123 234 345 456 567
2, in three bytes of the Search keyword segmentation, in the Full-text index to find the corresponding information:
Example ①: Search for the keyword "document", with a numeric serial number is "2~5", then it will be cut into:
234 345
In this way, it is aligned with the Full-text index.
Example ②: Search for the keyword "x-ray machine", with a numeric serial number is "3~7", then it will be cut into:
345 456 567
In this way, it is also aligned with the Full-text index.
Example ③: Search for the keyword "1 optical machines", with the number of numbers indicated is "0~2" and "4~7", then it will be cut into:
012 456 567
In this way, multiple-keyword searches are also aligned with Full-text indexing.