MySQL notes: Full Text Search
Full text search
MySQL supports several basic database engines, but not all engines Support full text search. The two most commonly used engines are MyISAM and InnoDB. The former supports full text search, and the latter does not.
Understanding full text search
In the previous study, we all know that there are two ways to match text. One is to use the like keyword for matching, and the other is to use a regular expression for matching.
Although regular expressions can be used to write a complex matching pattern for finding the required rows. However, there are several important restrictions:
1. Performance: When wildcards and regular expressions are used to match all rows in the table, these searches seldom use table indexes. Therefore, they are quite slow.
2. explicit control: In the matching process, we seldom clearly know what we want to match or what we don't.
3. Results with priority: when matching with a regular expression using a wildcard, only results containing the matching condition are returned, the matching degree is not differentiated (the difference between multiple matches and a single match is the difference between matching before and after ).
All the limitations mentioned above can be solved by using full text search.
Use full text search
To perform a full text search, you must index the columns to be searched and reindex them as data changes.
After the table is properly designed, MySQL automatically performs all indexes and re-indexing.
After the index, select can be used with match () and against () to actually execute the index.
How can I enable full text search support when creating a table ???
As follows:
In this way, fulltext is used to create an index for the specified row. If you want to specify multiple columns, you can add them to fulltext.
After the index is defined, MySQL automatically maintains the index. Indexes are updated as they are being added, updated, or deleted.
Full text search
After the table is created, the table has the full text search capability. Next we will try the full text search.
Before the index, I added the following content to the test_text table:
When we want to use full text search to obtain data that contains MySQL content.
As follows:
select * from test_text where match(content) against('MySQL');
In the preceding statement, match (content) indicates that MySQL searches for the specified column. Note: The value passed to match () must be the same as that in fulltext. Against ('mysql') specifies the word MySQL as the search text.
We can see from the results that two rows contain MySQL strings.
The above search can also be completed using the Like we have learned, as follows:
The results of the two methods are the same. The order is also the same. In theory, using full text search will sort the output results by priority. However, we can see from the results that, in the first statement, the position where MySQL appears lags behind that of MySQL In the second statement. Why did MySQL appear first ??, It is a little different from MySQL.
To verify the problem, we can check the priority of the result. Before viewing the priority, I added a row of data starting with MySQL. Next let's take a look at the priority.
select id,content,match(content) against('MySQL') from test_text ;
It can be seen from the results that the row id = 7 has a higher priority than the row id = 8, because the priority is not related to the location where it appears, it is also related to factors such as text length.
After adding a row of data starting with MySQL, let's take a look at the results of the above two methods.
From the above two figures, we can see that, indeed, the full text search is used to sort the searched content, but the wildcard is not. As for the full text search, sorting the searched content is determined by many factors, such as the location of the appearance, the number of occurrences, and the length of the text.
The higher the level, the more advanced the appearance is. When we use search engines (such as Baidu and google) in our daily life to search for things, I believe that this logic is also used to produce results. Of course, apart from the fact that Baidu has charged other people's money to put unmatched content in front.
Use query Extension: with query expansion
The preceding section describes common full-text search.
However, we often have more strict requirements in our lives. For example, I want to search for MySQL strings, and you still want MySQL-related content (even if there is no MySQL word ), what should I do ??
This requires the query extension.
Usage: In against (keyword with query expansion)
From the results, we can see that when we use the query extension, we get more related content, even if the content does not include the MySQL words.
I still don't know how MySQL gets related keywords Based on MySQL.
Boolean text search
MySQL supports full text search in another form, called boolean mode. In Boolean mode, you can provide the following details:
Words to be matched; words to be excluded, arrangement prompts (specify words that are more important than other words, more important word levels are higher) expression grouping.
Note: It can be used even if there is no fulltext index, but the speed is quite slow.
To show the function of in boolean mode, a simple example is provided.
Select * from test_text where match (content) against ('mysql-interesting' in boolean mode); // return data that must contain MySQL but cannot contain interesting content.
For the full-text boolean operators, see the table below (from: MySQL required knowledge)
Examples of full-text boolean operators are similar to the preceding example.
Full text search instructions
1. MySQL has a built-in non-word list. These words are always ignored when indexing full text data. If necessary, you can overwrite this list.
2. Many words appear frequently and are useless for searching (because too many results are returned ). Therefore, MySQL defines a 50% rule. If a word appears in more than 50% rows, it is ignored as a non-word. 50% the rule is not used in boolean mode.
3. If the number of rows in the table is less than three, no results are returned for full text search. Therefore, if each word does not appear, it must be at least 50%.
4. Ignore single quotes in words. For example, the don't index is dont.
Summary
This is the end of the full text search process. One of my feelings is: it took me 10 minutes to read this section of MySQL, but I followed this section to write a blog, it took two hours. Although it took so much time, there are still some gains, which are more than just reading a book, and the use of match/against should be more familiar. That's why I have been following this book to learn about MySQL. So far, I have followed this book with 18 chapters. It has taken nearly two weeks before and after, I used to think that I still know some about the database. Now it seems that I only know the "add, delete, modify, and query" steps in the database.