Deep analysis of Mysql 5.7 Chinese full-text Search _mysql

Source: Internet
Author: User

Objective

In fact, full-text search in MySQL very early support, but only has been supporting English. The reason is that he has always used the space as a participle of the separator, and for the Chinese, it is obvious that the space is not appropriate, the need for Chinese semantic segmentation. This does not, starting from MySQL 5.7, MySQL built-in ngram full-text search plug-in, used to support Chinese word segmentation, and MyISAM and InnoDB engine is effective.

In the use of Chinese to retrieve the word breaker Ngram before the MySQL configuration file to set the size of his word, for example,

[Mysqld]
ngram_token_size=2

This sets the word size to 2. Remember, the larger the size of participle, the larger the volume of the index, so to set the appropriate size according to their own situation.

Sample Table structure:

CREATE TABLE articles (
   ID intunsigned auto_increment not NULL PRIMARY KEY,
   titlevarchar (?), Body
   Text,
   fulltext (title,body) with PARSER ngram
  ) engine=innodbcharacter SET utf8mb4;

Sample data, with 6 rows of records.

Mysql> SELECT * from Articles\g
***************************1. Row ***************************
  id:1
Title: Database Management Body
 : In this tutorial I will show you how to manage the database
***************************2. Row ***************************
  ID : 2
Title: Database application Development Body
 : Learning to develop database applications
***************************3. Row ***************************
  id:3
title:mysql Complete Manual body
 : Learn everything about MySQL
***************************4. Row ********************
  id:4
Title: Database and transaction processing body: An introduction to the
 transaction of the System learning database
***************************5. Row ***********
  id:5
title:nosql essence Body
 : Learn about various unstructured database
***************************6. Row * * * *
  id:6
Title:sql language
 detailed body: Learn more If you use various SQL
6 rows Inset (0.00 sec)

To explicitly specify a full-text search table source

mysql> setglobal innodb_ft_aux_table= "New_feature/articles";
Query OK, 0 rows Affected (0.00 sec)

Through the system table, you can see exactly how to divide the data in the articles.

Mysql> SELECT *from information_schema. Innodb_ft_index_cache LIMIT 20,10;
+------+--------------+-------------+-----------+--------+----------+
| WORD | first_doc_id | last_doc_id | Doc_count | doc_id| POSITION |
+------+--------------+-------------+-----------+--------+----------+
|   2 |   2 |   1 |  2 |  |
| Xi m |   4 |   4 |   1 |  4 |  |
| Learned the |   6 |   6 |   1 |  6 |  |
| Learning to open |   3 |   3 |   1 |  3 |  |
| Learning Numbers |   5 |   5 |   1 |  5 |  Panax Notoginseng |
| Understanding |   6 |   7 |   2 |  6 |  |
| Understanding |   6 |   7 |   2 |  7 |
| | Business |   5 |   5 |   1 |  5 |  |
| Business |   5 |   5 |   1 |  5 |  |
| Ho Tube |   2 |   2 |   1 |  2 |  |
+------+--------------+-------------+-----------+--------+----------+
rows in Set (0.00 sec)

Here you can see that the length of the participle is set to 2, all of which have only two sets of data. The data above also contains the location of the row, ID, and so on.

Next, I will conduct a series of search demonstrations, using the same method as the original English search.

First, in natural language mode search:

1, to obtain the number of eligible,

Mysql>select COUNT (*) from articles
-> WHERE MATCH (title,body) against (' database ' in Naturallanguage MODE);
+----------+
| COUNT (*) |
+----------+
|  4 |
+----------+
1 row in Set (0.05 sec)

2, to get the matching ratio,

 Mysql>select ID, MATCH (title,body) against (' database ' in NATURAL LANGUAGE MODE) as
 score from articles;
+----+----------------------+
| id| score    |
+----+----------------------+
| 1 | 0.12403252720832825 |
| 2 | 0.12403252720832825 |
| 3 |     0 |
| 4 | 0.12403252720832825 |
| 5 | 0.062016263604164124|
| 6 |     0 |
+----+----------------------+
6rows in Set (0.00 sec)

Second, the Boolean mode search, which is relative to the nature of the search complex:

1, matching both the management and the database records,

Mysql> SELECT * from articles WHERE MATCH (title,body)
  ->  against (' + database + Admin ' in BOOLEAN MODE);
+----+------------+--------------------------------------+
| id| title  | Body         |
+----+------------+--------------------------------------+
| 1 | database management | In this tutorial I will show you how to manage a database  |
+----+------------+--------------------------------------+
1 Rowin Set (0.00 sec)

2, the match has the database, but does not manage the record,

Mysql> SELECT * from articles WHERE MATCH (title,body)
  ->  against (' + database-admin ' in BOOLEAN MODE);
+----+------------------+----------------------------+
| id| title    | Body      |
+----+------------------+----------------------------+
| 2 | database application Development  | Learning Development database Application   |
| 4 | Database and Transaction Processing | Introduction to the transaction of System learning Database  |
| 5 | NoSQL Essence  | Learn about various unstructured databases  |
+----+------------------+----------------------------+
3 rows in Set (0.00 sec)

3, matching MySQL, but the relevance of the database reduced,

Mysql> SELECT * from articles WHERE MATCH (title,body)
  ->  against (' > Database +mysql ' Inboolean MODE);
+----+---------------+-----------------+
| id| title   | Body   |
+----+---------------+-----------------+
| 3 | MySQL Complete manual | Learn everything about MySQL |
+----+---------------+-----------------+
1 Rowin Set (0.00 sec)

Third, query expansion mode, such as to search the database, then mysql,oracle,db2 will also be searched,

Mysql> SELECT * from articles
  ->  WHERE MATCH (title,body)
  ->  against (' database ' with QUERY expansion);
+----+------------------+--------------------------------------+
| id| title   | Body         |
+----+------------------+--------------------------------------+
| 1 | database Management  | In this tutorial I will show you how to manage a database  |
| 4 | Database and Transaction Processing | Introduction to the transaction of the System learning Database    |
| 2 | database application Development  | Learning to develop database applications     |
| 5 | NoSQL Essence  | Learn about all kinds of unstructured databases    |
| 6 | SQL Language  Details | Learn more about if you use a variety of SQL     |
| 3 | MySQL Complete manual  | Learn everything about mysql      |
+----+------------------+--------------------------------------+
6 rows in Set (0.01 sec)

Of course, I am here just a functional demo, more performance testing, you are interested in a detailed test. Since N-GRM is a commonly used Chinese word segmentation algorithm, has been used in the Internet, this integration into MySQL, presumably the effect will not be too big problem.

Summarize

The above is the entire content of this article, I hope the content of this article for everyone's study or work can bring certain help, if you have questions you can message exchange.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.