MySQL Chinese full-text indexing mysqlcft plug-ins (1/5)

Source: Internet
Author: User
Tags character set numeric mysql functions

MySQL Chinese full-text indexing plug-in MYSQLCFT features:
1. Advantages:
①, precision is very high: the use of the "three-byte crossover algorithm", the Chinese sentence segmentation, no Chinese word segmentation, search accuracy is far higher than the Chinese word segmentation algorithm, can achieve like '%...% ' accuracy rate.
②, quick query speed: query speed than like '%...% ' search fast 3~50 times, the end of the article has test results;
③, Standard plug-in type: MySQL 5.1 full-text indexing standard plug-in form development, do not modify the MySQL source code, does not affect the other MySQL functions, can quickly follow the new version of MySQL;
④, support version more: Support all MySQL 5.1 Release Candidate version, that is, MySQL 5.1.22 rc~ the latest MySQL 5.1.25 RC;
⑤, support character set: Supports MySQL character set including GBK, GB2312, UTF-8, Latin1, BIG5 (other character sets have not been tested);
⑥, System compatible: With I386 and x86_64 two versions, support 32-bit (i386) and 64-bit (x86_64) CPU and Linux system;
⑦, suitable for distribution: very suitable for MySQL slave distributed system architecture, no word library maintenance costs, there is no thesaurus synchronization problem.

2. Disadvantages:
①, mysqlcft Chinese Full-text indexing only applies to MyISAM tables, because MySQL only supports fulltext indexing of myisam tables;
②, MySQL can not be statically compiled installation, otherwise can not install MYSQLCFT plug-ins;
③, based on "Three byte Cross segmentation algorithm" index file will be larger than the mass, Ft-hightman and other based on "Chinese word segmentation algorithm," the index file slightly bigger, but not much. Based on my tests, MYSQLCFT is full-text indexed. The myi index file is 2~5 times the. MyD data file.


   second, the core idea of MYSQLCFT--"Three-byte cross-segmentation algorithm"

  

Note: This article takes the 0~7 numeral serial number to represent "English", "the numeral" and "half Chinese characters", in order to explain.
1, according to the three-byte Chinese sentence segmentation, the establishment of Full-text indexing:
For example, the word "Full-text index" or "1 X-ray machines" four will be split into 6, and the reverse index should be established:
012 123 234 345 456 567

2, in three bytes of the Search keyword segmentation, in the Full-text index to find the corresponding information:
Example ①: Search for the keyword "document", with a numeric serial number is "2~5", then it will be cut into:
234 345
In this way, it is aligned with the Full-text index.

Example ②: Search for the keyword "x-ray machine", with a numeric serial number is "3~7", then it will be cut into:
345 456 567
In this way, it is also aligned with the Full-text index.

Example ③: Search for the keyword "1 optical machines", with the number of numbers indicated is "0~2" and "4~7", then it will be cut into:
012 456 567
In this way, multiple-keyword searches are also aligned with Full-text indexing.

Home 1 2 3 4 5 last
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.