MySQL full-text index plug-in mysqlcft 1.0.0 installation instructions

Source: Internet
Author: User

When MySQL has high concurrent connections and a large number of database records, SELECT... WHERE... LIKE '%... % 'the full-text search method is not only inefficient, but also cannot be used for queries starting with the wildcard "%" and "_". A full table scan is required, which puts a lot of pressure on the database. MySQL provides a full-text index solution to solve this problem, which not only improves performance and efficiency (because MySQL indexes these fields to Optimize search ), in addition, it achieves higher quality search. However, up to now, MySQL does not support full-text Chinese indexing correctly.

An important difference between Chinese and Western words, such as English, is that Western words are separated by spaces in units of words. Chinese characters are in the unit of words. Words are composed of one or more words. There is no space between words and words. When you try to use full-text search in a field containing Chinese characters, you will not get the correct result because the Chinese characters do not define words like English spaces and cannot be separated by spaces, INDEX Chinese words.

Reference a section in MySQL 5.1 reference manual:
12.7. Full-text search (http://dev.mysql.com/doc/refman/5.1/zh/functions.html)
● MySQL supports full-text indexing and search. Full-text index FULLTEXT in MySQL. FULLTEXT indexes can only be used in MyISAM tables. They can be created from CHAR, VARCHAR, or TEXT columns as part of the create table statement, or subsequently added using alter table or create index. For large datasets, you can enter your data into a table without FULLTEXT indexes and create an index, which is faster than entering existing FULLTEXT indexes.

● The FULLTEXT analysis program will find certain delimiters to determine the start position and end position of a word, such as ''(delimiter), comma (,), and. (periods ). If words are not separated by delimiters (for example, in Chinese), The FULLTEXT analysis program cannot determine the start position and end position of a word. To add words or other indexed terms to FULLTEXT indexes in such a language, you must pre-process them so that they are separated by arbitrary separators such.

● Ideographic languages such as Chinese and Japanese do not have self-defined delimiters. Therefore, the FULLTEXT analyzer cannot determine the start and end positions of words in these or other languages.

The domestic MySQL full text index solution has two: one is the massive technology MySQL5.0.37 -- LinuxX86-Chinese +, the second is the mysql-5.1.11-ft-hightman developed by hightman, both are based on the Chinese Word Segmentation technology, splits Chinese statements. However, both of them have drawbacks. One is that the 64-bit operating system is not supported; the other is that the source code of MySQL is modified, and only a MySQL version is supported, so it is not easy to follow up the new version; third, the word library cannot be very big and comprehensive. For professional database content (such as searching public transit sites such as "east exit of the Summer Palace Road" and "west exit of Tsinghua East Road, real estate names such as "Lai Zhen xiangger" and "bihai yuntian"), full-text indexing based on Chinese Word Segmentation often does not produce any content, even if a word segmentation dictionary is added, it will not be comprehensive.

Due to the need for precise full-text query, I used the binary cross-splitting algorithm IDEA and used my own "Three-byte cross-splitting algorithm ", this MySQL full-text index plug-in-mysqlcft 1.0.0 was written ". Due to the rush of development time, undiscovered problems are inevitable, which will be continuously improved in subsequent versions. For MySQL tables with millions of records, full-text search is enough.

  • 9 pages in total:
  • Previous Page
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • Next Page

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.