MySQL Full-text index

Source: Internet
Author: User
Tags create index

First, how to set?



Click {full-Text search} at the end to set the full-text index, different MySQL version names may be different.

Second, set the conditions
1. The storage engine for the table is MyISAM, and the default storage engine InnoDB does not support full-text indexing (new version MYSQL5.6 InnoDB supports full-text indexing)
2. Field type: char, varchar, and text

Third, the configuration
My.ini the configuration file to add
# mysql Full-text index query keyword minimum length limit
[Mysqld]
Ft_min_word_len = 1
Restart MySQL after saving, execute SQL statement

Copy CodeThe code is as follows:
SHOW VARIABLES


See if the Ft_min_word_len is set up successfully, and if not, make sure
1. Confirm that the My.ini is properly configured, and be careful not to make the wrong My.ini location
2. Verify that MySQL is restarted. Restart your computer
Other related configuration please own Baidu.
Note: After you reset the configuration, indexes that have already been set need to be reset for the build index

Iv. SQL syntax
First generate the Temp table

Copy CodeThe code is as follows:
CREATE TABLE IF not EXISTS ' temp ' (
' id ' int (one) not NULL auto_increment,
' Char ' char (+) is not NULL,
' varchar ' varchar (not NULL),
' Text ' text is not NULL,
PRIMARY KEY (' id '),
Fulltext KEY ' char ' (' char '),
Fulltext KEY ' varchar ' (' varchar '),
Fulltext KEY ' text ' (' text ')
) Engine=myisam DEFAULT Charset=utf8 auto_increment=2;
INSERT into ' temp ' (' id ', ' char ', ' varchar ', ' text ') VALUES
(1, ' A bc I know 1 ', ' A bc I know 1 ', ' A bc I know 1 23 ');


Search ' char ' field ' a ' value

Copy CodeThe code is as follows:
SELECT * from ' temp ' WHERE MATCH (' char ') against (' a ')


But you will find that the query has no results?!
At this point you may think: oops, I clearly follow the steps to do, Ah, is it missing or wrong?
You do not worry, do the procedure is so, mistakes always have, calm down, worry is not solve the problem.

If a keyword in 50% of the data appears, then the word will be used as invalid word.
If you want to remove 50% now please use in BOOLEAN mode to search

Copy CodeThe code is as follows:
SELECT * from ' temp ' WHERE MATCH (' char ') against (' a ' in BOOLEAN MODE)


This makes it possible to query the results, but we do not recommend it.
Full-text index of the search mode of the introduction of self-Baidu.

We're going to add a few useless data. 50% limit is lifted

Copy CodeThe code is as follows:
INSERT into ' temp ' (
' ID ',
' Char ',
' varchar ',
' Text '
)
VALUES (
NULL, ' 7 ', ' 7 ', ' 7 '
), (
NULL, ' 7 ', ' 7 ', ' 7 '
), (
NULL, ' A,BC, I, know, 1,23 ', ' A,BC, I, know, 1,23 ', ' A,BC, I, know, 1,23 '
), (
NULL, ' x ', ' x ', ' x '
);


You can then query the data by executing the following SQL statement

Copy CodeThe code is as follows:
SELECT * from ' temp ' WHERE MATCH (' char ') against (' a ');
SELECT * from ' temp ' WHERE MATCH (' char ') against (' BC ');
SELECT * from ' temp ' WHERE MATCH (' char ') against (' I ');
SELECT * from ' temp ' WHERE MATCH (' char ') against (' know ');
SELECT * from ' temp ' WHERE MATCH (' char ') against (' 1 ');
SELECT * from ' temp ' WHERE MATCH (' char ') against (' 23 ');


The following SQL does not search the data

Copy CodeThe code is as follows:
SELECT * from ' temp ' WHERE MATCH (' char ') against (' B ');
SELECT * from ' temp ' WHERE MATCH (' char ') against (' C ');
SELECT * from ' temp ' WHERE MATCH (' char ') against (' know ');
SELECT * from ' temp ' WHERE MATCH (' char ') against (' Tao ');
SELECT * from ' temp ' WHERE MATCH (' char ') against (' 2 ');
SELECT * from ' temp ' WHERE MATCH (' char ') against (' 3 ');


If you are searching for multiple words, separate them with a space or a comma

Copy CodeThe code is as follows:
SELECT * from ' temp ' WHERE MATCH (' char ') against (' A X ');
SELECT * from ' temp ' WHERE MATCH (' char ') against (' a,x ');


The above SQL can query to three data

Five, participle
See here you should find that the value in our field is also a participle and cannot be inserted directly into the original data.
Full-Text Indexing application process:
1. Receive data-data segmentation-Warehousing
2. Receive data-Data segmentation-Query
Now there is an important question: how to participle data?
Data segmentation generally we will use some mature free word breaker, of course, if you have the ability to do your own word breaker, here we recommend the use of SCWS word-breaker.

Download First
1.php_scws.dll note the corresponding version
2.XDB dictionary File
3. Rule set files


Installing SCWS
1. First build a folder, the location is not limited, but it is best not Chinese path.
2. Extract {ruleset file}, throw all xdb, three INI files to D:\SCWS
3. Copy the Php_scws.dll to the Ext folder in your PHP directory
4. Add the following lines at the end of the php.ini:
[SCWS]

; Note Check that the Extension_dir setting in the php.ini is correct, otherwise set Extension_dir to NULL,
; The Php_scws.dll is then specified as an absolute path.

Extension = Php_scws.dll
Scws.default.charset = UTF8
Scws.default.fpath = "D:\scws"
5. Restart your server
Test

Copy CodeThe code is as follows:
$STR = "Test Chinese word segmentation";
$so = Scws_new ();
$so->send_text ($STR);
$temp = $so->get_result ();
$so->close ();
Var_dump ($temp);


If the installation is unsuccessful, please refer to the official documentation
--------------------------------------------------------------------------------
This allows us to use the full-text indexing technique.

************************************************************

http://blog.csdn.net/bbirdsky/article/details/45368897

MySQL has gradually supported full-text indexing and search from 3.23.23.  Full-text index is built index, full-text search is to check index.

Like is to use regular expression to do the query.
The MySQL full-text index is an index type:fulltext.
The index of a full-text index can only be used on fields of char, varchar, and text in the MyISAM table.
The index of the full-text index can be generated at the CREATE TABLE, ALTER TABLE, and CREATE index.

CREATE TABLE article (    INTnotNULLPRIMARYKEY,       VARCHAR(+),     TEXT,     Fulltext (title, body)) TYPE  =

To import large amounts of data to a table with full-text indexing index The speed is slow, it is recommended that you take out the full-text index and then import the data, and then add the full-text index after import.
Syntax for full-text search:

[search_modifier]

Three ways to search:

in in with/with in 

Expr is the string to search for.
No special characters.
Apply Stopwords.
Culling half of the row above the word, for example, each row has the word MySQL, that with MySQL to check, will not find any row, this in the number of row invincible is very useful, because it is meaningless to find all the row, at this time, MySQL is almost regarded as a stopword, but when the row has only two strokes, there is nothing to find out, because each word appears more than 50%, to avoid this situation, please use in BOOLEAN MODE.
The default search method.

SELECT *  from  WHERE  MATCH (title, body) against ('xxx' in

The preset search is not case-sensitive, to be case-sensitive, Columne's character set is changed from UTF8 to Utf8_bin.
Preset match ... Against are sorted by relevance, from high to low.
MATCH ... Against can be used with all MySQL syntax, such as join or add other filters.

--The first type of CountSELECT COUNT(*)  fromarticleWHEREMATCH (title, body) against ('XXX' inchNATURAL LANGUAGE MODE); --The second type of CountSELECT COUNT(IF(MATCH (title, body) against ('XXX' inchNATURAL LANGUAGE MODE),1,NULL)) as Count  fromArticle

When the number of matching pens is high, the first count is slower because the match ... Against are sorted by relevance first.
The second count is slower when the number of matching pens is small, because the second count sweeps all the data.
The field in MATCH (title, body) must be exactly the same as the field in Fulltext (title, body), if only Tancha title or body field, it is necessary to build another fulltext (title) or fulltext ( Body), and because of this, the MATCH () field must not cross table, but the other two search methods seem to be possible.

SELECT ID, MATCH (title, body) against ('xxx'  in AS  from

This makes it possible to get the relevant values, and because there is no where and order by, it is not sorted.

SELECT ID, MATCH (title, body) against ('xxx'  in AS    from WHERE MATCH (title, body) against (' xxx ') inch

Sorting also gets relevance, although match ... Against used two times, but MySQL knows the two match ... Against is the same, so it will only be used once.

SELECT ID, MATCH (title, body) against ('xxx'  in AS from ORDER bydesc 

Why not use it like this?
MySQL fulltext How to hyphenate:
The combination of letters, numbers, and the bottom line is considered a word and does not break the line.
characters that will be hyphenated: blank, comma (,) and Dot (.), but without these hyphenation languages, such as Chinese, you have to manually hyphenate the words yourself.
you can do a word breaker to replace the built-in word-breaker parser.
accept a single quote, such as AAA ' BBB as a word, but AAA ' BBB is two words.
the single quotes of the prefix or the end of the word are removed, such as ' AAA or AAA '.
when a full-text search is Stopword, strings with less than four characters are ignored.
It is possible to overwrite the built-in Stopword list.
You can modify the setting of a minimum of four characters.

In BOOLEAN MODE 
There are special characters in expr that assist in special search syntax.
SELECT *
From article
WHERE MATCH (title, body)
against (' +mysql-yoursql ' in BOOLEAN MODE);
must have msysql, and do not have yoursql.
features in BOOLEAN mode:
do not exclude more than 50% compliant row.
does not automatically reverse-sort by relevance.
you can search for a field without Fulltext index, but it is very slow.
limit the longest and shortest string.
apply Stopwords.
Search Syntax:
+: Be sure to have.
-: No, but this "can not have" refers to the row in line with the specified string can not be, so can not only "-yoursql" this is not found any row, must be used in conjunction with other syntax.
: (Nothing) preset usage, indicating dispensable, some words row comparison front, there is no row behind.
to increase the relevance of the word.
<: decrease relevance.
(): The condition can be nested.
+aaa + (>bbb <CCC)//found with AAA and BBB, or AAA and CCC, then aaa&bbb row in front of AAA&CCC
~: The correlation is negative by positive, indicating that owning the word will decrease the correlation, but not as "-" will exclude it, just in the back.
*: Universal word, unlike other syntax in front, this is to be followed by a string.
" ": enclose a sentence in double quotation marks in order to fully match, not chaizi.

In NATURAL LANGUAGE MODE with QUERY EXPANSION 
You can also use the with QUERY EXPANSION.
in NATURAL LANGUAGE mode derivative version.
first use in NATURAL LANGUAGE mode to do the search, get the most relevant field words added to the original expr, and then check again.
one of the magical features: You can use the database to find out MySQL or Oracle, the first query with Databae to get some results, extract the string from these results, at this time the probability of MySQL and Oracle are quite high, Finally, the database and these out of the string to do a query.
Magic function Two: When the correct string cannot be spelled out, the first time with a "similar" error string query, you can get the correct string, and then use the correct string to get the desired result.
because this type of query causes "noise" to explode, it is recommended that the first query string be as concise as possible.
Stopwords please refer to http://dev.mysql.com/doc/refman/5.1/en/fulltext-stopwords.html.
limitations of Full-text search:
can only be used on the MyISAM form.
support UTF-8.
Chinese support questions:
MySQL will not break the Chinese text: MySQL built-in Word based on is blank, comma and dot, the internal mechanism of the Idiot solution is that in the text when you put a blank hyphenation, but still have the following restrictions.
the query string is limited to a minimum of four characters: So 123 cannot be found in the text, you must change the Ft_min_word_len from the preset 4 to 1.
Although the same table can have different character set fields, the same Fulltext index field must be the same character set and collation.
the field in match must be exactly the same as in Fulltext, and in BOOLEAN mode allows for a different or even field with an Fulltext index, but it is very slow.
The against must be a string and cannot be a variable or a domain name.
Full Text Search makes index hint limited.
MySQL Full text search settings:
most of the parameters are startup parameters, which means that MySQL must be restarted after the modification.
some parameter modifications must be re-generated for the index file.
mysql> SHOW VARIABLES like ' ft% ';

Ft_boolean_syntax +->< () ~*: "" &|
Ft_min_word_len 4
Ft_max_word_len
ft_query_expansion_limit ft_stopword_file (built-in)

Ft_min_word_len: The shortest index string, the default value is 4, and the index file must be rebuilt after modification.
Ft_max_word_len: The longest index string, the default value varies by version, and the remainder is a bit.
[Mysqld]
ft_min_word_len=1
Ft_stopword_file:stopword file path, if left blank is not set to disable Stopword filtering, you must restart the MySQL and rebuild the index after the modification, Stopword file contents can be separated by the branch blank and comma Stopword, But the bottom line and the single quotation mark are considered valid string characters.
50% threshold limit: Config file in storage/myisam/ftdefs.h, change #define Gws_in_use gws_prob to #define Gws_in_use gws_freq, then recompile MySQL, Since the near-low gate will affect the accuracy of the data, this is not recommended, and in BOOLEAN mode can be used to avoid the 50% limit.
ft_boolean_syntax: Change the query character in Boolean mode without restarting MySQL or rebuilding the index.
Modify the identification of a string character, such as a valid character that "-" is identified as a string:
method One: Modify Storage/myisam/ftdefs.h's True_word_char () and Misc_word_char (), then recompile MySQL and finally rebuild the index.
Method Two: Modify the character set file, then use the character set in the Fulltext index field and finally rebuild the index.
Rebuild Index:
each table with Fulltext Index has to do this.
mysql> REPAIR TABLE tbl_name QUICK;
Note that if Myisamchk is used, it will cause the above set value to revert to the default value, because Myisamchk is not used for MySQL setting value.
Solution One: Add the modified set value to the Myisamchk parameter.
shell> myisamchk--recover--ft_min_word_len=1 tbl_name. MYI
solution Two: Both sides must be set.
[Mysqld]
ft_min_word_len=1
[Myisamchk]
ft_min_word_len=1
Solution Three: Replace the MYISAMCHK syntax with repair table, ANALYZE table, OPTIMIZE table, and ALTER TABLE because these grammars are executed by MySQL.

Second, full-text index

A normal index on a text field can only speed up the retrieval of the string that appears at the front of the field content, that is, the character at the beginning of the field content. If a field contains a large paragraph of text consisting of several or even multiple words, the normal index does not work. This kind of retrieval often appears in the form of like%word%, which is very complex for MySQL, and if the amount of data to be processed is large, the response time will be very long.

Such occasions are where full-text indexing (Full-text index) can take its place. When this type of index is generated, MySQL creates a list of all the words that appear in the text, and the query operation retrieves the relevant data records based on the list. The full-text index can be created with the datasheet, or it can be added with the following command if necessary later: ALTER TABLE tablename Add fulltext (Column1, Column2) has a full-text index. You can use the Select query command to retrieve data records that contain one or more given words.

The following is the basic syntax for this type of query command:


SELECT * FROM tablename WHERE MATCH (column1, Column2) against (' word1′, ' word2′, ' word3′ ')

The above command will query all data records for Word1, Word2, and Word3 in the Column1 and Column2 fields.

Note:InnoDB data tables do not support full-text indexing .

MySQL Full-text index

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.