This chapter will learn how to use the full-text search feature of MySQL for advanced data querying and selection.
18.1 Understanding Full-Text Search
Not all engines support full-text search as described in Chapter 21st, MySQL supports several basic database engines. Not all engines support the full text search described in this book. The two most commonly used engines are MyISAM and InnoDB, which support full-text search and are not supported by the latter. This is why, although most of the sample tables created in this book use InnoDB, there is a sample table (Productnotes table) that uses MyISAM. If you need full text search in your app, you should keep this in mind. The 8th chapter introduces the LIKE keyword, which uses the wildcard operator to match text (and some text). You can use like to find rows that contain special or partial values, regardless of where they are located within the column. In the 9th chapter, the text-based search is a further introduction to match the column values of the regular expression. Using regular expressions, you can write very complex matching patterns that find the rows you want.
Although these search mechanisms are useful, there are several important limitations.
- Performance--wildcard wildcards regular expression matching usually requires MySQL to attempt to match all rows in the table (and these searches rarely use the table index). As a result, these searches can be time-consuming because of the increasing number of rows being searched.
- Explicit control-using a wildcard wildcards regular to match the expression, it is difficult (and not always) to explicitly control what matches and what does not. For example, specifying a word must match, a word must not match, and a word can match only if the first word does match or it can be mismatched.
- The result of intelligence-although search based on wildcard wildcards regular expressions provides a very flexible search, none of them provide a way to intelligently select results. For example, a search for a particular word will return all rows that contain the word, without distinguishing between rows containing a single match and rows that contain multiple matches, which are arranged according to a possible better match. Similarly, a search for a particular word will not find rows that do not contain the word but contain other related words.
All these restrictions and more restrictions can be resolved with full-text search. When using full-text search, MySQL does not need to view each row separately, and does not need to parse and process each word separately. MySQL creates an index of the words in the specified column, and the search can be made for those words. In this way, MySQL can quickly and efficiently determine which words match (which lines contain them), which words do not match, how often they match, and so on.
18.2 using full-text search
For a full-text search, you must index the columns that are searched and re-index them as the data changes. After the table column is properly designed, MySQL automatically makes all indexes and re-indexes. After the index, select can be used with match () and against () to actually perform the search.
18.2.1 enable full-text search support
You typically enable full-text search when you create a table. The CREATE table statement (described in the 21st chapter) accepts the FULLTEXT clause, which gives a comma-delimited list of indexed columns. The following create statement demonstrates the use of the FULLTEXT clause:
The 21st chapter examines the CREATE TABLE statement in detail. Now, just know that the CREATE TABLE statement definition table Productnotes lists the columns it contains. There is a column named Note_text in these columns, and for full-text search, MySQL indexes it according to the instructions of clause fulltext (NOTE_TEXT). Here the fulltext indexes a single column, and you can specify multiple columns if you want. After the definition, MySQL automatically maintains the index. The index is automatically updated when rows are added, updated, or deleted. You can specify fulltext when you create a table, or specify later (in which case all existing data must be indexed immediately).
It takes time not to update the index with Fulltext when importing data, although not many, but it takes time after all. If you are importing data to a new table, you should not enable the Fulltext index at this time. You should first import all the data and then modify the table to define fulltext. This helps to import data faster (and the total time of the index data is less than the total time required to index each row separately).
18.2.2 for full-text search
After the index, a full-text search is performed using the two function match () and against (), where match () specifies the column being searched, and against () specifies the search expression to use. Here's an example:
This SELECT statement retrieves a single column note_text. A full-text search is executed because of the WHERE clause. Match (note_text) instructs MySQL to search for the specified column, against (' Rabbit ') to specify the word rabbit as the search text. Since there are two lines containing the word rabbit, these two rows are returned.
Use the full Match () description
The value passed to Match () must be the same as in the Fulltext () definition. If more than one column is specified, they must be listed (and in the correct order).
Search is case insensitive unless binary is used (not described in this chapter), full-text search is not case-sensitive.
The fact is that the search just now can be done simply with the LIKE clause, as follows:
This SELECT statement also retrieves two rows, but in a different order (although this is not always the case). None of the two SELECT statements above contain an ORDER BY clause. The latter (using like) returns data in a not particularly useful order. The former (using full-text search) returns data that is sorted with a good degree of text-matching output analysis of input and output analysis 16416518.2 using full-text search 123. Two rows contain the word rabbit, but the line containing the word rabbit as the 3rd Word is higher than the line as the 20th Word. This is important. An important part of full-text search is sorting the results. Rows with higher levels are returned first (because the rows are likely to be the rows you really want). For a demonstration of how sorting works, consider the following example:
Here, match () and against () are used in the select instead of the WHERE clause. This causes all rows to be returned (because there is no WHERE clause). Match () and against () are used to create a computed column (alias rank), which contains the level values computed by the full-text search. The rank is calculated by MySQL based on the number of rows morphemes, the number of unique words, the total number of morphemes for the entire index, and the number of rows that contain the word. As you can see, the row level that does not contain the word rabbit is 0 (so it is not selected by the WHERE clause in the previous example). The two rows that do contain the word rabbit have a rank value for each row, and the level value of the preceding line of the text morphemes higher than the rank value of the line following the word. This example helps explain how a full-text search excludes rows (excluding those with a level of 0) and how to sort the results (sorted in descending order by rank).
Sort multiple Search Items if you specify more than one search term, those rows that contain most matching words will have a rank value that is higher than those that contain fewer words (or only one match).
As you can see, the full-text search provides functionality that simple like search cannot provide. Moreover, because the data is indexed, the full-text search is fairly fast.
18.2.3 using Query extensions
Query extensions are used to try to relax the range of full-text search results returned. Consider the following scenario. You want to find out all the notes that refer to anvils. Only one comment contains the word anvils, but you also want to find out all the other lines that might be relevant to your search, even if they do not contain word analysis 16718.2 using full-text search 125anvils. This is also a task for query extensions. When using the query extension, MySQL scans the data and index two times to complete the search:
- First, a basic full-text search is performed to find all the rows that match the search criteria;
- Second, MySQL examines these matching rows and selects all the useful words (we will briefly explain how MySQL determines what is useful and what is useless).
- Second, MySQL is again full-text search, this time not only using the original conditions, but also use all the useful words.
With query extensions, you can find results that may be relevant, even if they do not accurately contain the word you are looking for.
Version query extensions that are only available for MySQL version 4.1.1 or more advanced are introduced in MySQL 4.1.1 and therefore cannot be used in previous versions.
Here is an example of a simple full-text search, without a query extension:
Only one row contains the word anvils, so only one row is returned. The following is the same search, this time using the query extension:
7 rows were returned this time. The first line contains the word anvils, so the highest rank. The second line has nothing to do with anvils, but it is also retrieved because it contains the two words in the first row (customer and recommend). Line 3rd also contains these two identical words, but they are positioned farther and farther apart in the text, and therefore contain this line, but rank third. The third line does not really involve anvils (by their product name). As you can see, the query extension greatly increases the number of rows returned, but doing so increases the number of rows that you actually do not want.
The more rows, the more rows in the table (the more text in those lines), the better the results returned using the query extension.
18.2.4 Boolean Text Search
MySQL supports a different form of full-text search, called a Boolean (Booleanmode) method. In a Boolean way, you can provide details about the following:
- the words to match;
- The word to repel (if a line contains the word, the row is not returned, even if it contains other specified words);
- Arrange hints (specify that certain words are more important than others, and that more important words are of higher rank);
- Expression grouping;
- Some other content.
Even if you do not have a fulltext index, you can use a Boolean method that differs from the full-text search syntax used so far in that it can be used even if the Fulltext index is not defined. But this is a very slow operation (its performance will decrease as the amount of data increases). To demonstrate the role of in BOOLEAN mode, give a simple example:
This full-text search retrieves all the lines that contain the word heavy (there are two lines). It uses the keyword in Boolean mode, but does not actually specify a Boolean operator, so the result is the same as the result without specifying a Boolean.
The behavior difference in Boolean mode Although the result of this example is the same as not in Boolean mode, there is an important difference in its behavior (even if this particular example is not shown). We will point out in section 18.2.5.
To match a row that contains heavy but does not contain any words that begin with rope, use the following query:
Only one row is returned this time. This time still matches the word heavy, but-rope* explicitly instructs MySQL to exclude rows that contain rope* (any word that starts with rope, including ropes), which is why the first line in the previous example is excluded.
Code changes required in MySQL 4.x if you are using mysql4.x, the above example may not return any rows. This is an error in the * operator processing. To use this example in MySQL 4.x, use-ropes instead of-rope* (exclude ropes rather than exclude any word that starts with rope).
We have seen two full-text search Boolean operators-and *,-exclude a word, and * is a truncation operator (which can be imagined as a wildcard for the ending).
Here are a few examples of how some operators work:
This search matches lines containing the words rabbit and bait.
There is no operator specified, and this search matches the line containing at least one word in rabbit and bait.
This search matches the phrase rabbit bait instead of matching two words rabbit and bait.
Match rabbit and carrot to increase the rank of the former and lower the latter.
This search matches the word safe and combination, reducing the latter's rank.
Arrange without sorting in Boolean mode, the returned rows are not sorted in descending order of rank values.
18.2.5 How to use full text search
Before concluding this chapter, give some important notes about full-text search.
- When indexing full-text data, short words are ignored and excluded from the index. Short words are defined as words that have 3 or 3 characters (this number can be changed if needed).
- MySQL has a list of built-in non-use words (Stopword) that are always ignored when indexing full-text data. If you want, you can override this list (see the MySQL documentation to learn how to do this).
- Many words appear at a high frequency, and searching for them is useless (returning too many results). Therefore, MySQL prescribes a 50% rule that if a word appears in more than 50% rows, it is ignored as a non-word. 50% rules are not used in Booleanmode.
- If the number of rows in the table is less than 3 rows, the full-text search does not return a result (because each word either does not appear or at least appears in the 50% row).
- Ignores single quotes in the word. For example, the Don ' t index is dont.
- Languages that do not have word separators, including Japanese and Chinese, do not properly return full-text search results.
- As mentioned earlier, full-text search is supported only in the MyISAM database engine.
No proximity operator Proximity search is a feature of many full-text search support that can search for adjacent words (in the same sentence, in the same paragraph or in a specific number of words, and so on). MySQL Full text Search does not now support proximity operators, but future versions have plans to support this operator.
This chapter describes why you use full-text search and how to use the match () and against () functions of MySQL for full-text searches. We also learned about query extensions (which can increase the chance of finding relevant matches) and how to use Boolean methods for finer-grained lookup control.
MySQL must know-18th chapter-Full Text Search