How to compile MySQL full-text index plug-in

Source: Internet
Author: User

1. Introduction

The full-text index plug-in is used to expand the full-text retrieval feature of MYISAM. Through full-text search, we can perform word segmentation on documents, images, videos, and other data types to create an index for quick search.

Currently, MySQL only supports full-text retrieval in the MYISAM storage engine. For innodb, it will be implemented in Versions later than 6.0.

However, full-text indexing also has the following restrictions:

1. Only MyISAM is supported.

2. Chinese characters are not supported

3. If multiple character sets are supported in a single table, all fulltext indexed columns must use the same character set and library.

4. The MATCH () column list must be exactly the same as the list of columns defined by some FULLTEXT indexes IN the table, unless MATCH () is in boolean mode.

5. the parameter for AGAINST () must be a constant string.

 

In this case, what role does Full Text Plugin play. The word segmentation program of MYISAM stores the data in the column into the full-text index. It also uses word segmentation to process the strings that appear in the query; full Text Plugin can take over this function completely.

For example, you can use Plugin to perform word segmentation and search for multimedia data. You can use your own algorithms to perform word segmentation, or even change the full-text search syntax.

 

 

2. How to compile Full Text Plugin

The plugin mainly includes init, deinit, and parse programs. init () is called once before each SQL Execution, and deinit function is called after execution. The Parse () function performs syntax analysis during SQL Execution.

 

1) Declare the plug-in

The struct st_mysql_ftparser is used to declare a full text plug-in.

Struct st_mysql_ftparser

{

Intinterface_version;

Int (* parse) (MYSQL_FTPARSER_PARAM * param );

Int (* init) (MYSQL_FTPARSER_PARAM * param );

Int (* deinit) (MYSQL_FTPARSER_PARAM * param );

};

Field

Type

Description

Interface_version

Int

Version Number

Parse

Int (* parse) (MYSQL_FTPARSER_PARAM * param );

Syntax analysis program

Init

Int (* init) (MYSQL_FTPARSER_PARAM * param );

Initialization function, function pointer

Deinit

Int (* deinit) (MYSQL_FTPARSER_PARAM * param );

Clear functions and function pointers

 

We can see that these three functions share a common parameter type: MYSQL_FTPARSER_PARAM. This struct is initialized by MySQL, but we can also modify its function pointer to use custom functions, the struct is as follows:

 

Field

Type

Description

Mysql_parse

Int (* mysql_parse) (struct st_mysql_ftparser_param *,

Char * doc, int doc_len)

By default, the full-text word segmentation function is built in mysql.

Mysql_add_word

Int (* mysql_add_word) (struct st_mysql_ftparser_param *,

Char * word, int word_len,

MYSQL_FTPARSER_BOOLEAN_INFO *)

It is used to process each decomposed word. These words are usually added to a tree or list. Insert, update, and delete records in the full-text index.

Ftparser_state

Void *

We can allocate additional memory in this pointer for transferring between different API interfaces.

Mysql_ftparam

Void *

Used internally by MySQL to pass information to mysql_parse and mysql_add_word. You do not need to modify it when writing plugin.

Cs

Struct charset_info_st *

Character Set of the document

Doc

Char *

The document to be parsed. For example, you can enter a url in the syntax analysis program. We can read the file corresponding to the Url and then analyze it.

Length

Int

The length of the document. This is because the doc may not end with \ 0. Note this when writing plugin.

Flags

Int

Currently, there is only one option: MYSQL_FTFLAGS_NEED_COPY. This option is used to tell mysql_add_word that a copy of a word is required. The built-in mysql_parse () does not need to specify this Flag because it uses the doc pointer, and the doc is valid after leaving the function.

Mode

Enum enum_ftparser_mode

The operation types include:

MYSQL_FTPARSER_SIMPLE_MODE: the parser returns only the required words, excluding the stop and filter words.

MYSQL_FTPARSER_WITH_STOPWORDS: used for Boolean query word matching. In this case, all words, including stop words, must be considered.

MYSQL_FTPARSER_FULL_BOOLEAN _ INFO: Used to parse a Boolean query string containing the Boolean operator. In this case, we need

Set the mysql_add_word parameter -- MYSQL_FTPARSER_BOOLEAN_INFO


 

 

When the mode is set to MYSQL_FTPARSER_FULL_BOOLEAN _ INFO. We need to set the last parameter for the function mysql_add_word. Let's look at the prototype of the function again:

Int (* mysql_add_word) (structst_mysql_ftparser_param *,

Char * word, int word_len,

MYSQL_FTPARSER_BOOLEAN_INFO *);

 

The final parameter struct is MYSQL_FTPARSER_BOOLEAN_INFO, that is, the struct st_mysql_ftparser_boolean_info, as shown below:

Field

Type

Description

Type

Enum enum_ft_token_type

The token type can be identified as follows:

FT_TOKEN_EOF:

No need to set

FT_TOKEN_WORD:

Common Words

FT_TOKEN_STOPWORD:

Stopword, which is ignored when an index is created

FT_TOKEN_LEFT_PAREN:

Indicates that a subexpression starts.

FT_TOKEN_RIGHT_PAREN:

Indicates that a subexpression ends.

Yesno

Int

Used to support Boolean operations:

> 0 words must be matched, corresponding to +

<0 does not allow matching, corresponding-

= 0 may be matched and will increase the correlation level

Weight_adjust

Int

The importance,

> 0, corresponding to> Operator

<0, corresponding to the <Operator

 

Wasign

Char

Word importance:

Non-0 indicates the noise word, which reduces the correlation degree, corresponding to the operator ~

Trunc

Char

If it is not 0, the word is considered as a prefix, and all the prefixes With This prefix will be matched. The corresponding operator is *

Prev

Char

Ignore

Quot

Char *

Corresponding to the double quotation mark Operator

 

  • 1
  • 2
  • Next Page

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.