1. Introduction
The full-text index plug-in is used to expand the full-text retrieval feature of MYISAM. Through full-text search, we can perform word segmentation on documents, images, videos, and other data types to create an index for quick search.
Currently, MySQL only supports full-text retrieval in the MYISAM storage engine. For innodb, it will be implemented in Versions later than 6.0.
However, full-text indexing also has the following restrictions:
1. Only MyISAM is supported.
2. Chinese characters are not supported
3. If multiple character sets are supported in a single table, all fulltext indexed columns must use the same character set and library.
4. The MATCH () column list must be exactly the same as the list of columns defined by some FULLTEXT indexes IN the table, unless MATCH () is in boolean mode.
5. the parameter for AGAINST () must be a constant string.
In this case, what role does Full Text Plugin play. The word segmentation program of MYISAM stores the data in the column into the full-text index. It also uses word segmentation to process the strings that appear in the query; full Text Plugin can take over this function completely.
For example, you can use Plugin to perform word segmentation and search for multimedia data. You can use your own algorithms to perform word segmentation, or even change the full-text search syntax.
2. How to compile Full Text Plugin
The plugin mainly includes init, deinit, and parse programs. init () is called once before each SQL Execution, and deinit function is called after execution. The Parse () function performs syntax analysis during SQL Execution.
1) Declare the plug-in
The struct st_mysql_ftparser is used to declare a full text plug-in.
Struct st_mysql_ftparser
{
Intinterface_version;
Int (* parse) (MYSQL_FTPARSER_PARAM * param );
Int (* init) (MYSQL_FTPARSER_PARAM * param );
Int (* deinit) (MYSQL_FTPARSER_PARAM * param );
};
Field |
Type |
Description |
Interface_version |
Int |
Version Number |
Parse |
Int (* parse) (MYSQL_FTPARSER_PARAM * param ); |
Syntax analysis program |
Init |
Int (* init) (MYSQL_FTPARSER_PARAM * param ); |
Initialization function, function pointer |
Deinit |
Int (* deinit) (MYSQL_FTPARSER_PARAM * param ); |
Clear functions and function pointers |
We can see that these three functions share a common parameter type: MYSQL_FTPARSER_PARAM. This struct is initialized by MySQL, but we can also modify its function pointer to use custom functions, the struct is as follows:
Field |
Type |
Description |
Mysql_parse |
Int (* mysql_parse) (struct st_mysql_ftparser_param *, Char * doc, int doc_len) |
By default, the full-text word segmentation function is built in mysql. |
Mysql_add_word |
Int (* mysql_add_word) (struct st_mysql_ftparser_param *, Char * word, int word_len, MYSQL_FTPARSER_BOOLEAN_INFO *) |
It is used to process each decomposed word. These words are usually added to a tree or list. Insert, update, and delete records in the full-text index. |
Ftparser_state |
Void * |
We can allocate additional memory in this pointer for transferring between different API interfaces. |
Mysql_ftparam |
Void * |
Used internally by MySQL to pass information to mysql_parse and mysql_add_word. You do not need to modify it when writing plugin. |
Cs |
Struct charset_info_st * |
Character Set of the document |
Doc |
Char * |
The document to be parsed. For example, you can enter a url in the syntax analysis program. We can read the file corresponding to the Url and then analyze it. |
Length |
Int |
The length of the document. This is because the doc may not end with \ 0. Note this when writing plugin. |
Flags |
Int |
Currently, there is only one option: MYSQL_FTFLAGS_NEED_COPY. This option is used to tell mysql_add_word that a copy of a word is required. The built-in mysql_parse () does not need to specify this Flag because it uses the doc pointer, and the doc is valid after leaving the function. |
Mode |
Enum enum_ftparser_mode |
The operation types include: MYSQL_FTPARSER_SIMPLE_MODE: the parser returns only the required words, excluding the stop and filter words. MYSQL_FTPARSER_WITH_STOPWORDS: used for Boolean query word matching. In this case, all words, including stop words, must be considered. MYSQL_FTPARSER_FULL_BOOLEAN _ INFO: Used to parse a Boolean query string containing the Boolean operator. In this case, we need Set the mysql_add_word parameter -- MYSQL_FTPARSER_BOOLEAN_INFO
|
When the mode is set to MYSQL_FTPARSER_FULL_BOOLEAN _ INFO. We need to set the last parameter for the function mysql_add_word. Let's look at the prototype of the function again:
Int (* mysql_add_word) (structst_mysql_ftparser_param *,
Char * word, int word_len,
MYSQL_FTPARSER_BOOLEAN_INFO *);
The final parameter struct is MYSQL_FTPARSER_BOOLEAN_INFO, that is, the struct st_mysql_ftparser_boolean_info, as shown below:
Field |
Type |
Description |
Type |
Enum enum_ft_token_type |
The token type can be identified as follows: FT_TOKEN_EOF: No need to set FT_TOKEN_WORD: Common Words FT_TOKEN_STOPWORD: Stopword, which is ignored when an index is created FT_TOKEN_LEFT_PAREN: Indicates that a subexpression starts. FT_TOKEN_RIGHT_PAREN: Indicates that a subexpression ends. |
Yesno |
Int |
Used to support Boolean operations: > 0 words must be matched, corresponding to + <0 does not allow matching, corresponding- = 0 may be matched and will increase the correlation level |
Weight_adjust |
Int |
The importance, > 0, corresponding to> Operator <0, corresponding to the <Operator |
Wasign |
Char |
Word importance: Non-0 indicates the noise word, which reduces the correlation degree, corresponding to the operator ~ |
Trunc |
Char |
If it is not 0, the word is considered as a prefix, and all the prefixes With This prefix will be matched. The corresponding operator is * |
Prev |
Char |
Ignore |
Quot |
Char * |
Corresponding to the double quotation mark Operator |