Full-text search using MySQL built-in functions

Source: Internet
Author: User
Tags mysql tutorial
MATCH (col1, col2,...) AGAINST (expr [INBOOLEANMODE | WITHQUERYEXPANSION]) MySQL supports full-text indexing and search. Full-text index FULLTEXT in MySQL. FULLTEXT indexes can only be used in MyISAM tables. They can be used as part of the CREATETABLE statement from CHAR, VARCHAR, or TEXT columns.

MATCH (col1, col2,...) AGAINST (expr [in boolean mode | with query expansion]) MySQL supports full-text indexing and search. Full-text index FULLTEXT in MySQL. FULLTEXT indexes can only be used in MyISAM tables. They can be used as part of the create table statement from CHAR, VARCHAR, or TEXT columns.

MATCH (col1, col2,...) AGAINST (expr [in boolean mode | with query expansion])

MySQL supports full-text indexing and search. Full-text index FULLTEXT in MySQL. FULLTEXT indexes can only be used in MyISAM tables. They can be created from CHAR, VARCHAR, or TEXT columns as part of the create table statement, or subsequently added using alter table or create index. For large datasets, you can enter your data into a table without FULLTEXT indexes and create an index, which is faster than entering existing FULLTEXT indexes.

Full-text search is executed together with the MATCH () function.

mysql> CREATE TABLE articles (    ->   id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,    ->   title VARCHAR(200),    ->   body TEXT,    ->   FULLTEXT (title,body)    -> );Query OK, 0 rows affected (0.00 sec) mysql> INSERT INTO articles (title,body) VALUES    -> ('MySQL Tutorial','DBMS stands for DataBase ...'),    -> ('How To Use MySQL Well','After you went through a ...'),    -> ('Optimizing MySQL','In this tutorial we will show ...'),    -> ('1001 MySQL Tricks','1. Never run mysqld as root. 2. ...'),    -> ('MySQL vs. YourSQL','In the following database comparison ...'),    -> ('MySQL Security','When configured properly, MySQL ...');Query OK, 6 rows affected (0.00 sec)Records: 6  Duplicates: 0  Warnings: 0 mysql> SELECT * FROM articles    -> WHERE MATCH (title,body) AGAINST ('database');+----+-------------------+------------------------------------------+| id | title             | body                                     |+----+-------------------+------------------------------------------+|  5 | MySQL vs. YourSQL | In the following database comparison ... ||  1 | MySQL Tutorial    | DBMS stands for DataBase ...             |+----+-------------------+------------------------------------------+2 rows in set (0.00 sec)

The MATCH () function performs a natural language search in the database for a string. A database contains one or two columns in FULLTEXT. The search string is given as a parameter to AGAINST. For each row in the table, MATCH () returns a correlation value, that is, a similarity measurement between the text of the row in the specified column in the search string and MATCH () table.

By default, the search execution mode is case-insensitive. However, you can perform a case-sensitive full-text search on indexed columns in binary order. For example, you can give a latin1_bin sorting method to a column using the latin1 character set. Full-text search is case sensitive.

As in the preceding example, when MATCH () is used in a WHERE statement, the related value is a non-negative floating point number. Zero correlation means no similarity. Correlation is calculated based on the number of words in the row, the number of unique sub-items in the row, the total number of words in the database, and the number of files (rows) containing special words.

For natural language full-text search, the names of columns in the MATCH () function must be the same as those in some FULLTEXT indexes in your table. Note that the columns named in the MATCH () function (question and full text) are the same as those in the FULLTEXT index of the article table. To search for the question and full text respectively, you should create a FULLTEXT index for each column.

Alternatively, you can run a Boolean search or use the query extension to search.

The above example basically shows how to use the MATCH () function with the weak correlation sequence of returned rows. The following example shows how to explicitly retrieve relevant values. The returned rows are in an indefinite ORDER because the SELECT statement does not include the WHERE or order by clause:

mysql> SELECT id, MATCH (title,body) AGAINST ('Tutorial')
-> FROM articles;
+----+-----------------------------------------+
| id | MATCH (title,body) AGAINST ('Tutorial') |
+----+-----------------------------------------+
| 1 | 0.65545833110809 |
| 2 | 0 |
| 3 | 0.66266459226608 |
| 4 | 0 |
| 5 | 0 |
| 6 | 0 |
+----+-----------------------------------------+
6 rows in set (0.00 sec)

The example below is more complex. Query the returned correlation values, and sort the rows in the order of weak correlation. To achieve this, you should specify MATCH () twice: one in the SELECT list and the other in the WHERE clause. This will not cause additional internal operations because the MySQL optimizer notices that the two MATCH () calls are the same, and only one full-text search code is activated.

mysql> SELECT id, body, MATCH (title,body) AGAINST
-> ('Security implications of running MySQL as root') AS score
-> FROM articles WHERE MATCH (title,body) AGAINST
-> ('Security implications of running MySQL as root');
+----+-------------------------------------+-----------------+
| id | body | score |
+----+-------------------------------------+-----------------+
| 4 | 1. Never run mysqld as root. 2. ... | 1.5219271183014 |
| 6 | When configured properly, MySQL ... | 1.3114095926285 |
+----+-------------------------------------+-----------------+
2 rows in set (0.00 sec)

There are 2 rows in the table (0.00 seconds)

MySQL FULLTEXT treats a sequence of any character prototype (letters, numbers, and underscores) as a word. This sequence may also contain single quotes ('), but no more than one in a row. This means that aaa 'bbb is considered as a word, while aaa ''bbb is considered as two words. The single quotation marks located before or after a word are removed by the FULLTEXT analysis program; the 'aaa' bbb is changed to aaa' bbb.

The FULLTEXT analyzer looks for certain delimiters to determine the start position and end position of a word, such as '', comma, and ). If words are not separated by delimiters (for example, in Chinese), The FULLTEXT analysis program cannot determine the start position and end position of a word. To add words or other indexed terms to FULLTEXT indexes in such a language, you must pre-process them so that they are separated by arbitrary separators such.

Some words are ignored in full-text search:

◆ Any word that is too short will be ignored. The default minimum length of words that can be found in full-text search is 4 characters.

◆ Words in the stopword are ignored. A disabled word is a word that is too common and is considered non-semantic like "the" or "some. A built-in stop word exists, but it can be rewritten through the user-defined list.

Each correct word in the dictionary and query is measured based on its importance in the dictionary and query. In this way, a word that appears in many files has a low importance (and even many words are of zero importance) because of its low Semantic Value in this special dictionary. On the contrary, if this word is rare, it will produce a high importance. Then the word importance is combined to calculate the correlation of the row.

This technology is most suitable for use with a large Dictionary (in fact, it has been carefully adjusted at this time ). For small tables, word distribution does not fully reflect their semantic values, and this pattern may sometimes produce odd results. For example, although the word "MySQL" appears in each row of the article table, the search for this word may not get any results:

mysql> SELECT * FROM articles
-> WHERE MATCH (title,body) AGAINST ('MySQL');

The searched word cannot be found (0.00 seconds)

The search result is blank because the word "MySQL" appears in at least 50% rows in the full text. Therefore, it is included in the stopword. For large datasets, this operation is most appropriate-a natural language query will not be returned from a 1 GB table every row. Small datasets may be of little use.

It is less likely that a word that matches half of the content of all rows in the table can search for relevant documents. In fact, it is easier to find a lot of irrelevant content. We all know that this happens frequently when we try to use search engines to search for information on the Internet. It can be inferred that the row containing the word is given a lower semantic value because of its special dataset. A given word may have more than 50% domain values in a dataset, but not in another dataset.

When you try to use full-text search for the first time to understand its working process, this 50% Domain value provides important implication operations: If you create a table, in addition, only one or two rows of the article are inserted, and each word in the text appears in at least 50% of all rows. Then you won't find anything. Be sure to insert at least three rows, and more beneficial. Users who need to bypass the 50% restriction can use Boolean search code.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.