MYSQL 3: full text search _ MySQL

Source: Internet
Author: User
MYSQL 3: full text search bitsCN.com


MYSQL 3: full text search

1. understand full text search

1. MyISAM supports full text search, but InnoDB does not.

2. when using full text search, MySQL does not need to view each row separately, and does not need to analyze and process each word separately. MySQL creates an index for each word in a specified column and searches for these words. In this way, MySQL can quickly and effectively determine which words match, which words do not match, their matching frequency, and so on.

2. use full text search

1. to perform a full text search, you must index the columns to be searched and reindex as data changes. After the table columns are properly designed, MySQL automatically performs all indexes and re-indexing.

After the index, SELECT can be used with Match () and Against () for actual search.

2. enable full text search when creating a table.

[SQL]

Create table productnotes

(

Note_id int not nullauto_increment,

Note_text text null,

Primary key (note_id ),

Fulltext (note_text)

) Engine = MyISAM;

After definition, MySQL automatically maintains the index. When a row is added, updated, or deleted, the index is automatically updated.

3. do not use FULLTEXT when importing data.

4. perform full text search

Match () specifies the column to be searched, and Against () specifies the search expression to be used.

[SQL]

Mysql> select * from productnotes

-> WhereMatch (note_text) Against ('designe ');

+ --------- + ---------------------------------------------------------------------

------------------------------------------------------ +

| Note_id | note_text

|

+ --------- + ---------------------------------------------------------------------

------------------------------------------------------ +

| 6 | LimsLink isdesigned to interface output from chromatography data sy

Stems (CDSs) to LIMS. |

| 5 | This line ofproprietary reagents, containers, and automation tools

Is designed for genomics and drug discovery research. |

+ --------- + ---------------------------------------------------------------------

------------------------------------------------------ +

2 rows in set (0.03 sec)

5. the value passed to the Match () must be the same as that in the FULLTEXT () definition. If multiple columns are specified, they must be listed (and in the correct order ).

6. unless BINARY is used, full text search is case insensitive.

[SQL]

Mysql> select * from productnotes

-> Where BINARYMatch (note_text) Against ('line ');

+ --------- + ---------------------------------------------------------------------

------------------------------------------------------ +

| Note_id | note_text

|

+ --------- + ---------------------------------------------------------------------

------------------------------------------------------ +

| 5 | This line ofproprietary reagents, containers, and automation tools

Is designed for genomics and drug discovery research. |

+ --------- + ---------------------------------------------------------------------

------------------------------------------------------ +

1 row in set (0.05 sec)

7. Sorting results is an important part of full text search. Rows with a higher level are returned first.

The level is calculated by MySQL based on the number of words in the row, the number of unique words, the total number of words in the entire index, and the number of rows containing the word. In text, the level value of the row before the word is higher than the level value of the row after the word is backed up.

[SQL]

Mysql> select note_id, Match (note_text) Against ('This line') as rank, note_text

-> Fromproductnotes

-> WhereMatch (note_text) Against ('This line ');

+ --------- + ------------------ + --------------------------------------------------

---------------------------------------------------------------------------- +

| Note_id | rank | note_text

|

+ --------- + ------------------ + --------------------------------------------------

---------------------------------------------------------------------------- +

| 5 | 0.81339610830754 | This line of proprietary reagents,. containers,

Nd automation tools is designed. for genomics and drugdiscovery. research. |

| 7 | 0.76517958501676 | specificities include both alpha-beta and beta-

Beta. This line from chromatography. data systems (CDSs) and toLIMS. |

+ --------- + ------------------ + --------------------------------------------------

---------------------------------------------------------------------------- +

2 rows in set (0.00 sec)

8. query extension

When using the query extension, MySQL scans data and indexes twice to complete the search.

First, perform a basic full text search to find all rows that match the search condition;

Secondly, MySQL checks the matching rows and selects all useful words;

Again, MySQL re-performs a full text search. this time, not only the original conditions are used, but also all useful words are used.

The query extension can be used to find possible results, even if they do not exactly contain the words to be searched.

The more rows in the table, the better the result returned by using the query extension.

The query extension function is introduced in MySQL4.1.1.

[SQL]

Mysql> select note_id, Match (note_text) Against ('This line') as rank, note_text

-> Fromproductnotes

-> Where Match (note_text) Against ('This line' with query expansion );

+ --------- + ------------------ + --------------------------------------------------

---------------------------------------------------------------------------- +

| Note_id | rank | note_text

|

+ --------- + ------------------ + --------------------------------------------------

---------------------------------------------------------------------------- +

| 5 | 0.81339610830754 | This line of proprietary reagents,. containers,

Nd automation tools is designed. for genomics and drugdiscovery. research. |

| 7 | 0.76517958501676 | specificities include both alpha-beta and beta-

Beta. This line from chromatography. data systems (CDSs) and toLIMS. |

| 3 | 0 | Human S-100. monoclonal. and polyclonal specifici

Ties include both alpha-beta and beta-beta isoforms. |

| 6 | 0 | LimsLink is. designed to interfaceoutput. from c

Hromatography. data systems (CDSs) and to LIMS. |

| 1 | 0 | PepTool allows users tostore, manage. analyze,

Nd visualize protein data. |

+ --------- + ------------------ + --------------------------------------------------

---------------------------------------------------------------------------- +

5 rows in set (0.00 sec)

9. boolean mode)

In Boolean mode, you can provide details about the following:

The word to be matched;

Words to be excluded;

Arrange the prompt. (specify some words that are more important than others)

Expression grouping;

Other content.

[SQL]

Mysql> select note_id, note_text

-> Fromproductnotes

-> WhereMatch (note_text) Against ('line' in boolean mode );

+ --------- + ---------------------------------------------------------------------

--------------------------------------------------------- +

| Note_id | note_text

|

+ --------- + ---------------------------------------------------------------------

--------------------------------------------------------- +

| 5 | This line ofproprietary reagents,. containers, and automation tools

Is designed. for genomicsand drug discovery. research. |

| 7 | specificitiesinclude both alpha-beta and beta-beta. This line fro

M chromatography. data systems (CDSs) and to LIMS. |

+ --------- + ---------------------------------------------------------------------

--------------------------------------------------------- +

2 rows in set (0.00 sec)

You can use Boolean text search even if the FULLTEXT index is not available. But it is very slow.

Mysql> select note_id, note_text/* matches line and does not contain systems */

-> Fromproductnotes

-> WhereMatch (note_text) Against ('line-systems * 'in boolean mode );

+ --------- + ---------------------------------------------------------------------

--------------------------------------------------------- +

| Note_id | note_text

|

+ --------- + ---------------------------------------------------------------------

--------------------------------------------------------- +

| 5 | This line ofproprietary reagents,. containers, and automation tools

Is designed. forgenomics and drug discovery. research. |

+ --------- + ---------------------------------------------------------------------

--------------------------------------------------------- +

1 row in set (0.00 sec)

Mysql> select note_id, note_text/* match line and systems */

-> Fromproductnotes

-> WhereMatch (note_text) Against ('+ line + system' in boolean mode );

+ --------- + ---------------------------------------------------------------------

--------------------------------------------------- +

| Note_id | note_text

|

+ --------- + ---------------------------------------------------------------------

--------------------------------------------------- +

| 7 | specificitiesinclude both alpha-beta and beta-beta. This line fro

M chromatography. data systems (CDSs) and to LIMS. |

+ --------- + ---------------------------------------------------------------------

--------------------------------------------------- +

1 row in set (0.00 sec)

Mysql> select note_id, note_text/* MATCH line or systems */

-> Fromproductnotes

-> WhereMatch (note_text) Against ('line system' in boolean mode );

+ --------- + ---------------------------------------------------------------------

--------------------------------------------------------- +

| Note_id | note_text

|

+ --------- + ---------------------------------------------------------------------

--------------------------------------------------------- +

| 5 | This line ofproprietary reagents,. containers, and automation tools

Is designed. forgenomics and drug discovery. research. |

| 6 | LimsLink is. designed to interface output. from chromatography. data

Systems (CDSs) and toLIMS. |

| 7 | specificitiesinclude both alpha-beta and beta-beta. This line fro

M chromatography. data systems (CDSs) and to LIMS. |

+ --------- + ---------------------------------------------------------------------

--------------------------------------------------------- +

3 rows in set (0.00 sec)

Mysql> select note_id, note_text/* matching phrase */

-> Fromproductnotes

-> WhereMatch (note_text) Against ('"This line"' in boolean mode );

+ --------- + ---------------------------------------------------------------------

--------------------------------------------------------- +

| Note_id | note_text

|

+ --------- + ---------------------------------------------------------------------

--------------------------------------------------------- +

| 5 | This line ofproprietary reagents,. containers, and automation tools

Is designed. forgenomics and drug discovery. research. |

| 7 | specificitiesinclude both alpha-beta and beta-beta. This line fro

M chromatography. data systems (CDSs) and to LIMS. |

+ --------- + ---------------------------------------------------------------------

--------------------------------------------------------- +

2 rows in set (0.00 sec)

10. Instructions for use

L when indexing full text data, short words are ignored and excluded from the index. Short words are defined as words with three or less characters on the face (this number can be updated if needed ).

L MySQL has a built-in stopword list, which is always ignored when indexing full text data. If necessary, you can overwrite this list.

L MySQL sets a 50% rule. if a word appears in more than 50% rows, it is ignored as a non-word. 50% rules are not used in boolean mode.

L if the number of rows in the table is less than three rows, no results will be returned for full text search (because each word or does not appear, or at least appears in 50% rows ).

L ignore single quotes in words. For example, the don't index is dont.

L a language that does not have a word separator cannot properly return the full text search result.

BitsCN.com

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.