The following article introduces Mysqlmatchagainst full-text search and a mysql full-text search plug-in. If you need it, refer to it.
The following article introduces Mysql match against full-text search and a mysql full-text search plug-in. If you need it, please refer to it.
For large databases, it is very fast to load data to a TABLE without FULLTEXT indexes and then CREATE an INDEX using alter table (or create index. Loading data to a table with FULLTEXT indexes will be very slow.
1. Prerequisites for using Mysql full-text search fulltext
The table type must be MyISAM.
The field type for full-text search must be char, varchar, and text.
2. Create an advanced configuration for full-text search
Because the default configuration of Mysql is that the index word length is 4, to support Chinese words, first change this.
* Unix users need to modify my. cnf. Generally, this file is stored in/etc/my. cnf. If not found, find/-name 'my. cnf 'first'
Add the following content to the [mysqld] location:
The Code is as follows: |
|
Ft_min_word_len = 2
|
Other attributes include
The Code is as follows: |
|
Ft_wordlist_charset = gbk Ft_wordlist_file =/home/soft/mysql/share/mysql/wordlist-gbk.txt. Ft_stopword_file =/home/soft/mysql/share/mysql/stopwords-gbk.txt.
|
A little explanation:
Ft_wordlist_charset indicates the character set of the dictionary, which currently supports (UTF-8, gbk, gb2312, big5)
Ft_wordlist_file is a Word Table file. Each line contains one word and its word frequency (separated by several tabs or spaces, dedicated for elimination)
Ft_stopword_file indicates filtering out non-indexed word lists, one row.
The minimum length of the word ft_min_word_len is added to the index. The default value is 4. To support Chinese words, change to 2.
3. Create a full-text search
The FullText keyword is used to identify the field in the TABLE under construction. The existing TABLE uses alter table (or create index) to CREATE an INDEX.
The Code is as follows: |
|
CREATE fulltext INDEX index_name ON table_name (colum_name ); |
4. Use full-text search
Use the MATCH function IN the WHERE clause of the SELECT statement. the keywords of the index are identified by AGAINST. in boolean mode only supports the keyword. You do not need to care about the position or whether it is the starting position.
The Code is as follows: |
|
SELECT * FROM articles where match (tags) AGAINST ('travel 'in boolean mode ); |
5. For details, please visit the Mysql official website.
Http://dev.mysql.com/doc/refman/5.1/zh/functions.html#fulltext-search
This is Mysql 5.1, but 4.x can also be used as a reference. Basically, Mysql 4.1 is used.
MySQL has supported Full-Text indexing for a long time. Currently, fulltext is an index type that only applies to the MyISAM table and has restrictions on defining the data types of index columns, only the following three types of combinations can be char, varchar, and text. Fulltext can be defined together when a table is created, or after the table is created, an index is appended using the statement alter table or create index. In short, the results are the same, however, there is a big difference in efficiency between the two. A large number of experiments prove that for a large number of tables, loading data first and then defining the full-text index is much faster than inserting a large amount of data into a table that has already defined the full-text index. Q: What is this? In fact, the principle is very simple. The former only requires one-time operation on your index list, sorting and comparison are completed in the memory, and then written to the hard disk; the latter will read the index table one by one on the hard disk and then compare the Final write. Naturally, the speed will be very slow. MySQL implements full-text index query through the match () and against () functions. The field name in match () must be consistent with the field defined in fulltext. If you search in boolean mode, you can only include a field in fulltext and do not need to list it all. Against () defines the string to be searched and the mode in which the database is required to perform full-text index search. The following example describes the three-medium search modes supported by fulltext.
Google's Chinese Word Segmentation Technology is the American company named Basis Technology (http://www.basistech.com) provided by the Chinese Word Segmentation Technology, Baidu uses its own company developed Word Segmentation Technology, search is used in the domestic massive technology (http://www.hylanda.com) provided Word Segmentation technology. The Word Segmentation technology of the industry's comment on massive technologies is currently considered to be the best Chinese Word Segmentation technology in China. Its word segmentation accuracy exceeds 99%, which also makes the error rate of the search results in the search results very low.
Large http://www.hylanda.com/server/
Download MySQL5.0.37 -- LinuxX86-Chinese +
You do not need to install mysql in advance and then execute
The Code is as follows: |
|
Groupadd mysql Useradd-g mysql Cd/usr/local Gunzip </root/mysql-chplus-5.0.37-linux-i686.tar.gz | tar xvf- Ln-s/usr/local/mysql-chplus-5.0.37/usr/local/mysql Cd mysql Scritps/mysql_install_db -- user = mysql Chown-R mysql data Chown-R mysql. /Usr/local/mysql/bin/mysqld_safe -- user = mysql & Test: Create table test (testid int (4) not null, testtitle varchar (256), testbody varchar (256), fulltext (testtitle, testbody )); Insert into test values -> (NULL, '',' '), -> (NULL, 'Hello you', 'Hello you '); Select * from test where match (testtitle, testbody) against (' 'in boolean mode ); |
Mysql full-text search has three modes:
1. Natural language search. This is the default full-text search method for mysql. SQL example:
[Code = plain]
The Code is as follows: |
|
Select id, title FROM post where match (content) AGAINST ('search keyword ') |
Or explicitly declare the use of natural language search
[Code = plain]
The Code is as follows: |
|
Select id, title FROM post where match (content) AGAINST ('search keyword' in natural language mode) |
Because the natural language search method is the default MODE, you can omit the "in natural language mode" section of the declared MODE.
What are the features of the natural language search model:
1. Ignore stopword. Words that frequently appear in English, such as and/or/to, are considered to have no actual meaning. Searching for these words will not produce any results.
2. if a word frequently appears in a dataset more than 50%, it is also considered as a stop. Therefore, if the database only contains one row of data, no matter how you search for the full text, you cannot get the result.
3. All search results have a correlation data, and the returned results are automatically sorted from high to low by correlation.
4. Only search for independent words without considering the partial match of words. For example, boxing is not used as the search target when searching for a box.
2. Boolean search. This search method is characterized by the absence of 50% rules in the natural search mode. Even if words frequently appear in a data set more than 50%, they will be searched and returned as search targets, in addition, the local match of words is also used as the target for retrieval. SQL example
[Code = plain]
The Code is as follows: |
|
Select id, title FROM post where match (content) AGAINST ('search keyword' in boolean mode) |
Iii. Extended natural language search for tape queries. [Code = plain]
The Code is as follows: |
|
Select id, title FROM post where match (content) AGAINST ('search keyword' in boolean mode with expansion) |
This mode is not understood yet.
In my actual use, I also found the following details:
• You must specify the sorting method for the returned results during Boolean search. It does not automatically sort the results as in natural language search.
• Even Boolean searches do not retrieve words with a length less than or equal to 3, because mysql has a system variable FT_MIN_WORD_LEN that specifies the minimum word length acceptable for full-text searches, the default value is 4 ..