Mysql full-text index solution _ MySQL

Source: Internet
Author: User
Mysql full-text index solution (to) MySQL supports full-text indexing and searching since 3.23.23.
Full-text index is used to create an index, and full-text search is used to query the index.
LIKE uses Regular Expression for query.
MySQL full-text index is an index type: FULLTEXT.
The full-text index can only be used in the char, varchar, and text fields of the MyISAM table.
Full-Text index indexes can be generated in create table, alter table, and create index.
Create table...
Create table article (
Id INT AUTO_INCREMENT not null primary key,
Title VARCHAR (200 ),
Body TEXT,
FULLTEXT (title, body)
) TYPE = MYISAM;
Alter table...
Create index...
It takes a long time to reverse a large amount of data to a table with a full-text index. we recommend that you remove the full-text index and then reverse the data, and then add the full-text index.
Full-text search syntax:
MATCH (col1, col2,...) AGAINST (expr [search_modifier])
Three search methods:
IN BOOLEAN MODE
IN NATURAL LANGUAGE MODE
In natural language mode with query expansion/WITH QUERY EXPANSION

IN NATURAL LANGUAGE MODE
Expr is the string to be searched.
No special characters.
Apply Stopwords.
Remove more than half of rows. for example, if every row has mysql, no row can be found when mysql is used, this is useful when the number of rows is invincible, because it is meaningless to find all the rows. at this time, mysql is almost treated as stopword; but when there are only two rows, it cannot be found by any ghost, because each word has more than 50% characters. to avoid this situation, use in boolean mode.
The preset search method.
SELECT *
FROM article
Where match (title, body)
AGAINST ('XXX' in natural language mode );
The default search is case-insensitive. to perform case-insensitive searches, the character set of columne should be changed from utf8 to utf8_bin.
The default MATCH... AGAINST is sorted by correlation, from high to low.
MATCH... AGAINST can be used with all MySQL syntaxes, such as JOIN or other filtering conditions.
-- Count
Select count (*)
FROM article
Where match (title, body)
AGAINST ('XXX' in natural language mode );
-- Count
Select count (IF (MATCH (title, body) AGAINST ('XXX' in natural language mode), 1, NULL) AS count
FROM article
When there are many matching records, the first type of count is slow, because MATCH... AGAINST will first sort by correlation.
When the number of matching records is small, the second type of count is slow, because the second type of count will scan all the data.
The fields in MATCH (title, body) must be exactly the same as those in FULLTEXT (title, body). If you only need to query the title or body field, you have to create another FULLTEXT (title) or FULLTEXT (body). because of this, the MATCH () field cannot span across tables, but the other two search methods seem to be acceptable.
SELECT id, MATCH (title, body) AGAINST ('XXX' in natural language mode) as score
FROM article;
In this way, the related values can be obtained, and because there is no WHERE and order by, it will not be sorted.
SELECT id, MATCH (title, body) AGAINST ('XXX' in natural language mode) as score
FROM article
Where match (title, body)
AGAINST ('XXX' in natural language mode );
Sorting and correlation are obtained. Although MATCH... AGAINST is used twice, MySQL knows that the two matches... AGAINST are the same, so only one time is used.
SELECT id, MATCH (title, body) AGAINST ('XXX' in natural language mode) as score
FROM article
Order by score desc;
Why not use it like this?
How to break MySQL FULLTEXT:
The combination of letters, numbers, and bottom line is considered a word, and the bottom line is not broken.
Characters that can be broken: white space, comma (,), and dot (.). However, if you do not need these words, such as Chinese, you must manually disconnect them.
You can create a plug-in to replace the built-in broken word parser.
A single quotation mark is accepted. for example, aaa 'BBB is regarded as a word, but aaa' bbb is a word.
The single quotation marks at the beginning or end of the prefix will be removed, such as 'AAA or aaa '.
During full-text search, stopword and strings with less than four characters are ignored.
You can override the built-in stopword list.
You can modify the setting of at least four characters.

IN BOOLEAN MODE
Expr has special characters to assist in special search syntax.
SELECT *
FROM article
Where match (title, body)
AGAINST ('+ mysql-yoursql' in boolean mode );
Be sure to have msysql and do not have yoursql.
In boolean mode features:
No more than 50% rows are excluded.
It is not automatically sorted in reverse order of relevance.
You can search for fields without FULLTEXT index, but it will be very slow.
Restrict the longest and shortest strings.
Apply Stopwords.
Search syntax:
+: Yes.
-: No, but this "no" indicates that a specified string cannot be contained in a row, therefore, you must use the-yoursql syntax instead of any row.
(Nothing) default usage, indicating dispensable. if there is a prefix, the backend is compared.
>: Increase the relevance of the word.
<: Reduce the correlation.
(): The condition can be nested.
+ Aaa + (> bbb ~ : Turn the relevance from positive to negative, indicating that having this word will reduce the relevance, but unlike "-", it will only be later.
*: Tens of thousands of characters. unlike other syntaxes, this should be followed by strings.
": Enclose a sentence with double quotation marks to indicate that it must be completely consistent and cannot be split.

IN NATURAL LANGUAGE MODE WITH QUERY EXPANSION
You can also use with query expansion.
Derivative version of in natural language mode.
Search with in natural language mode to obtain the most relevant fields, add them to the original expr, and query them again.
One of the amazing features: You can use the database to find mysql or oracle. the first query uses databae to get some results and extract strings from these results. at this time, the chance of getting mysql and oracle is quite high, finally, use the database and the obtained strings for a query.
Magic function 2: If you cannot spell out the correct string, you can use the "similar" error string for the first time to query the string. you can use the correct string to obtain the desired result.
Because this query method will increase "noise", we recommend that you streamline the query string for the first time.
For Stopwords, see http://dev.mysql.com/doc/refman/5.1/en/fulltext-stopwords.html.
Full-text search restrictions:
It can only be used on MyISAM tables.
Support UTF-8.
Chinese support questions:
MySQL does not break text: MySQL built-in words are based on blank, comma, and Dot. the built-in mechanism's idiot solution is to insert blank words into the text when saving the text, but there are still the following restrictions.
The query string must contain at least four characters. therefore, no text can be found in the first, second, or third Fields. you must change ft_min_word_len from the default 4 to 1.
Although the same table can have fields of different character sets, the fields in the same FULLTEXT index must be of the same character set and collation.
The fields in match must be exactly the same as those in fulltext. in boolean mode can be different. even fields without FULLTEXT index are used, but the speed is very slow.
AGAINST must be a string, not a variable or domain name.
Full-text search restricts index hint.
MySQL full-text search settings:
Most of the parameters are startup parameters, that is, MySQL must be restarted after modification.
Some parameter modifications must re-generate the index file.
Mysql> show variables like 'FT % ';

Ft_boolean_syntax +-> <()~ *: "" & |
Ft_min_word_len 4
Ft_max_word_len 84
Ft_query_expansion_limit 20 ft_stopword_file (built-in)

Ft_min_word_len: the shortest index string. the default value is 4. The index file must be rebuilt after modification.
Ft_max_word_len: The longest index string. the default value varies by version.
[Mysqld]
Ft_min_word_len = 1
Ft_stopword_file: indicates the stopword file path. if it is left blank, stopword filtering is disabled. after modification, MySQL must be restarted and indexes must be re-created. stopword file content can be separated by commas, however, the bottom line and single quotes are valid string characters.
50% threshold: the configuration file is stored in storage/myisam/ftdefs. h. change # define GWS_IN_USE GWS_PROB to # define GWS_IN_USE GWS_FREQ and re-compile MySQL. because near-low threshold affects data accuracy, we do not recommend this, if in boolean mode is available, the 50% limit can be avoided.
Ft_boolean_syntax: the query character that changes in boolean mode. you do not need to restart MySQL or re-create indexes.
Modify the identification of string characters. for example, "-" is a valid character of the string:
Method 1: modify the true_word_char () and misc_word_char () of storage/myisam/ftdefs. h, re-compile MySQL, and re-create the index.
Method 2: modify the character set File, use the character set in the FULLTEXT index field, and re-create the index.
Re-indexing:
This is required for every table with FULLTEXT index.
Mysql> repair table tbl_name QUICK;
Note that if myisamchk is used, the above setting value will be returned to the default value, because myisamchk is not set by MySQL.
Solution 1: Add the modified value to the parameter of myisamchk.
Shell> myisamchk -- recover -- ft_min_word_len = 1 tbl_name.MYI
Solution 2: Set both sides.
[Mysqld]
Ft_min_word_len = 1
[Myisamchk]
Ft_min_word_len = 1
Solution 3: replace myisamchk syntax with repair table, analyze table, optimize table, and alter table because these syntaxes are executed by MySQL.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.