MySQL Full text search tips _php Tutorial

Source: Internet
Author: User
Tags create index mysql tutorial
MySQL Reference Manual for version 4.1.0-alpha.


6.8 MySQL Full Text Search

When 3.23.23, MySQL began to support full-text indexing and searching. Full-text indexing is an fulltext type index in MySQL. The fulltext index is used for the MyISAM table and can be created on a CHAR, VARCHAR, or TEXT column using ALTER table or create index at or after create table. For large databases, loading the data into a table that does not have a fulltext index, and then creating an index with ALTER TABLE (or CREATE index), it will be very fast. Loading the data into a table that already has a Fulltext index will be very slow.

Full-Text search is done through the MATCH () function.

Mysql> CREATE TABLE articles (
Title VARCHAR (200),
-Body TEXT,
Fulltext (Title,body)
Query OK, 0 rows Affected (0.00 sec)

Mysql> INSERT into articles VALUES
(NULL, ' MySQL Tutorial ', ' DBMS stands for DataBase ... '),
--(NULL, ' How to use MySQL efficiently ', ' after you went through a ... '),
(NULL, ' Optimising MySQL ', ' in this tutorial we'll show ... '),
--(NULL, ' 1001 MySQL Tricks ', ' 1. Never run mysqld as root. 2 ... '),
(NULL, ' MySQL vs. Yoursql ', ' in the following database comparison ... '),
(NULL, ' MySQL Security ', ' when configured properly, MySQL ... ');
Query OK, 6 rows Affected (0.00 sec)
Records:6 duplicates:0 warnings:0

Mysql> SELECT * from articles
-WHERE MATCH (title,body) against (' database ');
| ID | Title | Body |
| 5 | MySQL vs. Yoursql | In the following database comparison ... |
| 1 | MySQL Tutorial | DBMS stands for DataBase ... |
2 rows in Set (0.00 sec)

The function MATCH () executes a natural language search for a string against a set of text (a set of columns contained in one or more columns in a fulltext index). The search string is given as a parameter of against (). The search is performed in a way that ignores the case of letters. For each record row in the table, MATCH () returns a correlation value. That is, the similarity scale between the search string and the text of the column specified in the MATCH () list in the record line.

When MATCH () is used in a WHERE clause (see the example above), the returned record rows are automatically sorted by relevance from the high to the bottom. The correlation value is a non-negative floating point number. 0 relevance means not similar. Correlation is calculated based on the number of words in the record line, the number of unique words in the row, the total number of words in the set, and the number of documents (record lines) that contain a special word.

It can also perform a search of a logical pattern. This is described in the following sections.

The previous example shows some basic instructions for using the function MATCH (). Record rows are returned in descending order of similarity.

The next example shows how to retrieve an explicit similarity value. If there is no where and there is no ORDER BY clause, the return row is unordered.

Mysql> SELECT Id,match (title,body) against (' Tutorial ') from articles;
| ID | MATCH (Title,body) against (' Tutorial ') |
| 1 | 0.64840710366884 |
| 2 | 0 |
| 3 | 0.66266459031789 |
| 4 | 0 |
| 5 | 0 |
| 6 | 0 |
6 rows in Set (0.00 sec)

The following example is a little more complicated. The query returns similarity and still returns the record line in the order in which the similarity decreases. In order to complete this result, you should specify MATCH () two times. This does not cause additional overhead because the MySQL optimizer will notice the same MATCH () call two times and only invoke the full-text search code once.

mysql> SELECT ID, body, MATCH (title,body) against
(' Security implications of running MySQL as Root ') as score
, from articles WHERE MATCH (title,body) against
(' Security implications of running MySQL as Root ');
| ID | Body | Score |
| 4 | 1. Never run mysqld as root. 2 ..... | 1.5055546709332 |
| 6 | When configured properly, MySQL ... | 1.31140957288 |
2 rows in Set (0.00 sec)

MySQL uses a very simple parser to separate the text into words. A "word" is any sequence of characters consisting of text, data, "'" and "_". Any word that appears on the Stopword list, or too short (3 characters or less), is ignored.

Each appropriate word in a set and query is measured according to its importance in the set and query. In this way, a word that appears in multiple documents will have a lower weight (possibly even a 0 weight) because it has a lower semantic value in this particular set. Otherwise, if the word is less, it will get a higher weight. Then, the weights of the words are combined to calculate the similarity of the records rows.

Such a technical work can work well with a large set (in fact, it will be carefully harmonized with it). For very small tables, word classifications are not sufficient to fully reflect their semantic values, and sometimes this pattern can produce strange results.

Mysql> SELECT * from articles WHERE MATCH (title,body) against (' MySQL ');
Empty Set (0.00 sec)

In the above example, the search term MySQL does not get any results because the word appears in more than half of the record lines. Similarly, it is effectively treated as a stopword (that is, a 0 semantic value word). This is the ideal behavior--a natural language query should not return each dandy from a 1GB table (second row).

It is rarely possible to find related documents in a word that matches half of the record lines in the table. In fact, it may find many unrelated documents. As we all know, this happens often when we try to search for something on the Internet through search engines. For this reason, in this particular data set, such a row is set to a low semantic value.

When 4.0.1, MySQL can also use the in BOOLEAN MODE modifier to perform a logical full-text search.

Mysql> SELECT * from articles WHERE MATCH (title,body)
-Against (' +mysql-yoursql ' in BOOLEAN MODE);
| ID | Title | Body |
| 1 | MySQL Tutorial | DBMS stands for DataBase ... |
| 2 | How to use MySQL efficiently | After you went through a ... |
| 3 | Optimising MySQL | In this tutorial we'll show ... |
| 4 | 1001 MySQL Tricks | 1. Never run mysqld as root. 2 ..... |
| 6 | MySQL Security | When configured properly, MySQL ... |

This query returns all record lines containing the word MySQL (note: The threshold for 50% is not used), but it does not contain the word yoursql. Note that a search for a logical pattern does not automatically sort the record rows in descending order of similar values. You can see from the above results that the highest similarity value (the one that contains MySQL two times) is listed last, not the first. A logical full-text search can work even without a fulltext index, but it is slower.

Logical full-Text search supports the following operators:

A leading plus sign indicates that the word must appear in each returned row of records.

A leading minus sign indicates that the word must not be present in each returned record line.

The default (when neither the plus sign nor the minus sign is specified) the word is arbitrary, but the record line containing it is arranged a bit higher. This imitation does not MATCH in the BOOLEAN MODE modifier () ... The behavior of against ().

< >
These two operators are used to change the base value of a word's similarity value. The < operator reduces the base value of the,> operator to increase it. See the example below.

( )
Parentheses are used to group words in sub-expressions.

A leading negative number acts like a negation operator, and the base value of the word that causes the line similarity is negative. It is useful for marking a noise word. A record containing such a word will be ranked lower, but will not be completely excluded, as this can use the-operator.

An asterisk is a truncation operator. Do not want the other operators, it should be appended to a word, not added in front.

The phrase, enclosed in double quotes, matches only the line of records that contains the phrase (literally, as if it were typed).
Here are some examples:

Apple Banana
Find a record line that contains at least one of the words above
+apple +juice
... Two words are included
+apple Macintosh
... Contains the word "apple", but if it contains "Macintosh", it will be arranged a bit higher
... Contains "Apple" but does not contain "Macintosh"
+apple + (>pie ... Include "Apple" and "pie," or "Apple" and "strudel" (in any order), but Apple pie is ranked higher than Apple strudel.
... Includes "Apple", "apples", "applesauce" and "applets"
"Some words"
... Can contain "Some words of wisdom", but not "some noise words"
6.8.1 limitations of the full text
All parameters of the match () function must be from a column from the same table, and must be part of the same Fulltext index, unless MATCH () is in BOOLEAN MODE.

The match () column list must match exactly the list of columns defined in one of the table's Fulltext indexes, unless match () is in BOOLEAN MODE.

The argument to against () must be a constant string.
6.8.2 Fine-tuned MySQL full text Search
Unfortunately, full-text search still has very few user-tunable parameters, although some additions are high on TODO. If you have a MySQL source release (see Chapter 2.3 Installing a MySQL source release), you can play more control over full-text search.

Note that full-text search is the best search effect and has been carefully adjusted. Modifying the behavior of default values, in most cases, will only make the search result worse. Do not modify the source code of MySQL unless you know what you are doing!

The minimum length of the indexed word is specified by the MySQL variable Ft_min_word_len. View Chapter Show VARIABLES. Change it to the value you want, and rebuild your fulltext index. (this variable is supported only starting with MySQL 4.0)

The Stopword list can be read from the file specified by the Ft_stopword_file variable. View Chapter Show VARIABLES. After you have modified the Stopword list, rebuild your fulltext index. (this variable is only supported starting from MySQL 4.0.10)

50% The threshold selection is determined by the particular measurement mode selected. To disable it, modify the following line in the ' myisam/ftdefs.h ' file:
#define Gws_in_use Gws_prob

#define Gws_in_use Gws_freq

Then recompile MySQL. In this case, you do not need to rebuild the index. Note: Using this will severely reduce the ability of MySQL to provide enough similarity values for MATCH (). If you do need to search for such common words, it is best to use the search in BOOLEAN MODE instead, which does not follow the 50% threshold.

Sometimes, search engine maintainers want to change the operators that are used for logical full-text searches. These are defined by the variable ft_boolean_syntax. View Chapter Show VARIABLES. However, this variable is read-only and its value is set in ' myisam/ft_static.c '.
For these changes, you are asked to rebuild your Fulltext index, and for a MyISAM table, the easiest way to rebuild the index file is as follows:

mysql> REPAIR TABLE tbl_name QUICK;

6.8.3 Full-Text Search TODO
Make all operations on the fulltext index faster
Proximity (Proximity) operator
Support for "Always-index words". They can be any string that the user wants to treat as a word, such as "C + +," "As/400," "TCP/IP," and so on.
Support for full-text search in the MERGE table
Support for multibyte characters
Establish a Stopword list in the language of the data
Stemming (of course, data-dependent languages)
Generic user-suppliable UDF Preparser.
Make the pattern more flexible (by adding some adjustable parameters to the fulltext in Create/alter TABLE) true techarticle Mysqlreferencemanualforversion4.1.0-alpha.----------------------------------------- ---------------------------------------6.8MySQL Full-text search to 3.23.23, ...

  • Related Article

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.