International - English

Cart Console

Topic Center

Contact Sales

Home > Developer > MySQL

MySQL Full-Text Search Ngram Plugin

Last Update:2018-02-09 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

MySql5.7 creating a full-text index

InnoDB the default full-text index parser is well suited for Latin, because Latin is a word with empty glyd. But for Chinese, Japanese, and Korean, there is no such delimiter. A word can consist of multiple words, so we need to deal with it in different ways. In MySQL 5.7.6 we can use a new full-text indexing plug-in to process them: N-gram parser.

What is N-gram?

In a full-text index, N-gram is a sequential n-word sequence in a paragraph of text. For example, using N-gram to "information system" to the word segmentation, the results are as follows:

How do I use N-gram Parser in InnoDB?

N-gram parser is loaded into MySQL by default and can be used directly. We only need to use the with PARSER Ngram when creating the full-text index in the DDL.

We introduced a new global variable called Ngram_token_size. It determines the size of n in N-gram, which is the size of the word. Its default value is 2, and this time, we are using Bigram. Its legal value range is 1 to 10. Now, it's natural to think of a question: How should you set the size of the ngram_token_size value in the actual application? Of course, we recommend the use of 2. But you can also choose any legal value by following this simple rule: set to the size of the smallest word you want to be able to query. If you want to query a single word, then we need to set it to 1. The smaller the value of the ngram_token_size is, the less space the full-text index takes up. In general, a query that is exactly equal to the ngram_token_size word is faster, but a word or phrase that is longer than it is queried will become slower.

N-gram word processing

N-gram parser and the system default full-text index parser have the following differences:

Word size check: Because of the ngram_token_size, Innodb_ft_min_token_size and innodb_ft_max_token_size will not apply to N-gram.
Useless words (stopword) Processing: Usually, for a new word, we will look up the Stopwords table to see if there are any matching words. If so, the word is not added to the full-text index. But in N-gram, we'll look at the Stopwords table to see if it contains the words. The reason for this is that there are a lot of meaningless characters, words and punctuation in the text of CJK. For example, if we add ' the ' to the Stopwords table, then for the system of the sentence ' information ', by default we'll end up with the word ' information ', ' system '. Where ' the ' and ' system ' is filtered out.

We can information_schema by querying. Innodb_ft_index_cache and INFORMATION_SCHEMA. Innodb_ft_table_table to query which words are inside the full-text index. This is a very useful debugging tool. If we find a document that contains a word that does not appear in the query results as we expect it to, then the word may be for some reason not in the full-text index. For example, it contains Stopword, or its size is smaller than ngram_token_size and so on. This time we can check the two tables to confirm.

MySQL Full-Text Search Ngram Plugin

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

MySQL batch update and batch update different values for mult... 01-13

The solution of no package Mysql-server available error when ... 05-28

The efficiency of MySQL nested query and connection query 11-16

The differences between int, bigint, smallint, and tinyint in... 10-17

MySQL row-level lock, table-level lock, page-level lock Detai... 12-17

MySQL Case statement (with instance) 04-01

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

MySQL Full-Text Search Ngram Plugin

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support