For large databases, it is very quick to load the data into a table with no Fulltext index and then use ALTER table (or CREATE index) to create the index. It will be very slow to load data into a table that already has a fulltext index.
1. Prerequisites for using MySQL Full-text search fulltext
The type of the table must be MyISAM
The field type that establishes Full-text search must be Char,varchar,text
2. Establish the advance configuration of Full-text search
Because MySQL's default configuration is to index the length of the word is 4, so to support the Chinese word, first change this.
*unix users to modify MY.CNF, general this file in/etc/my.cnf, if not found, first look find/-name ' my.cnf '
Add in [mysqld] location:
The code is as follows |
Copy Code |
Ft_min_word_len = 2
|
Other attributes also have
The code is as follows |
Copy Code |
Ft_wordlist_charset = GBK Ft_wordlist_file =/home/soft/mysql/share/mysql/wordlist-gbk.txt Ft_stopword_file =/home/soft/mysql/share/mysql/stopwords-gbk.txt
|
A little explanation:
Ft_wordlist_charset represents the dictionary's character set, currently supported well (UTF-8, GBK, gb2312, Big5)
Ft_wordlist_file is a thesaurus file, each line including a word and its frequency (with a number of tabs or spaces separated, disambiguation dedicated)
Ft_stopword_file means to filter out the not indexed thesaurus, one line.
Ft_min_word_len the minimum length of the word added to the index, the default is 4, in order to support the Chinese word is changed to 2
3. Establishment of Full-text Search
Using the FULLTEXT keyword in a table, an existing table creates an index with ALTER table (or CREATE INDEX)
The code is as follows |
Copy Code |
CREATE Fulltext INDEX index_name on table_name (COLUM_NAME); |
4. Using Full-text Search
In the WHERE clause of SELECT, with the match function, the indexed keyword is identified with against, in BOOLEAN mode is only the keyword, not the location, the starting position.
The code is as follows |
Copy Code |
SELECT * from articles WHERE MATCH (tags) against (' travel ' in BOOLEAN MODE); |
5. Detailed description Please parameter MySQL official website
Http://dev.mysql.com/doc/refman/5.1/zh/functions.html#fulltext-search
This is MySQL 5.1, but 4. X can also be used as a reference, basic one. I use MySQL 4.1.
MySQL supports Full-text indexing (full-text) for a long time now, Fulltext is an index type that applies only to MyISAM tables, and there are restrictions on the types of data that define indexed columns, only the following three combinations of char, varchar, and text. Fulltext can be defined together while creating a table, or after the table is created, by the statement ALTER TABLE or CREATE INDEX to append the index, in short, the effect is the same, but the efficiency of the two is very different, a large number of experiments prove that For a large number of tables, loading the data before defining the Full-text index is much faster than inserting large amounts of data into a table that has already been defined for full-text indexing. Must ask: What is this question? In fact, the reason is very simple, the former only need to do a one-time operation on your index list, sorting comparison is done in memory, and then write to the hard disk, the latter will be a hard drive to read the index table and then compare the final write, naturally this speed will be very slow. MySQL is the function of the match () and against () to implement its Full-text indexing query. The field names in match () are consistent with the fields defined in Fulltext, and if you are searching in Boolean mode, you can also allow only one field in Fulltext to be included, not all listed. Against () is defined as the string to search for and search queries that require the database to perform full-text indexing in which mode. The following is an example of the 3 search pattern supported by Fulltext.
Google's Chinese word segmentation technology is based on the United States called Basis Technology (http:// Www.basistech.com Company to provide Chinese word segmentation technology, Baidu is using the company's own development of the word segmentation technology, in search of the use of the domestic mass Technology (http://www.hylanda.com) to provide the word segmentation technology. Industry review of the vast technology of the word segmentation technology is currently considered the best Chinese word segmentation technology, the accuracy of the word segmentation is more than 99%, which also makes the search results in the search result error rate is very low.
Massive http://www.hylanda.com/server/
Download mysql5.0.37--linuxx86-chinese+
You do not need to install MySQL in advance and then execute it sequentially
code is as follows |
copy code |
groupadd mysql useradd-g mysql mysql cd/usr/local Gunzip </root/mys Ql-chplus-5.0.37-linux-i686.tar.gz|tar XVF- ln-s/usr/local/mysql-chplus-5.0.37/usr/local/mysql CD MySQL scritps/mysql_install_db--user=mysql chown-r mysql data chown-r MySQL. /usr/local/mysql/bin/mysqld_s Afe--user=mysql can test: CREATE TABLE Test (TestID int (4) NOT NULL, Testtitle varchar (256), TES Tbody varchar (256), Fulltext (Testtitle,testbody)); INSERT INTO test values -> (null, ' How are you ', ' How are you? '), -> (null, ' Good hello ', ' good Hello '); Select * FROM Test where mat CH (testtitle,testbody) against (' Hello ' in boolean mode); |
MySQL Full-text search has three different modes:
One, natural language search. This is the MySQL default Full-text search method, sql Example:
[Code=plain]
The code is as follows |
Copy Code |
Select Id,title from Post WHERE MATCH (content) against (' search keyword ') |
Or explicitly declare the use of natural language search methods
[Code=plain]
The code is as follows |
Copy Code |
Select Id,title from Post WHERE MATCH (content) against (' search keyword ' in NATURAL LANGUAGE MODE) |
Because natural language search is the default mode, you can omit the "in NATURAL LANGUAGE mode" section of the declarative pattern.
Features of natural language search patterns:
1. Ignoring the word stop (Stopword), the frequent occurrence of and/or/to in English is considered to be without the actual search meaning, search these will not get any results.
2. If the probability of a word appearing frequently in a dataset is more than 50%, it is also considered a stop word, so if there is only one row of data in the database, no result can be obtained regardless of how you search for Full-text.
3. Search results have a correlation degree of data, the return results automatically by correlation degree from high to low arrangement.
4. Only for independent words to retrieve, regardless of the word's local matching, such as search box, it will not be boxing as a retrieval target.
Second, Boolean lookup. This finding is characterized by the absence of 50% rules in the natural lookup pattern, which, even if the probability of a word appearing frequently in the dataset is more than 50%, is retrieved and returned as a search target and the local match of the word is retrieved as a target. SQL Example
[Code=plain]
The code is as follows |
Copy Code |
Select Id,title from Post WHERE MATCH (content) against (' search keyword ' in BOOLEAN MODE) |
Third, the tape query expansion of natural language lookup. [Code=plain]
The code is as follows |
Copy Code |
Select Id,title from Post WHERE MATCH (content) against (' search keyword ' in BOOLEAN MODE with expansion) |
This model is not understood for the time being.
The following details were also found in my actual use:
• Boolean lookup must specify how the returned result should be sorted, and it will not automatically sort the results by relevance as the natural language lookup does.
• Even a boolean lookup, a word that is less than or equal to 3, will not be retrieved because MySQL has a system variable Ft_min_word_len specifies the minimum word length acceptable for full-text retrieval, and the default value is 4.