To implement the fuzzy search function of the search engine, use the SQL keyword like to implement pattern matching.
Now we have a database named T1 with 10 thousand records. The title field is a text field, the length is 255, and the body is a comment field. See the following SQL statement:
Select * from T1 where title like '% thunder %'
In this case, all records with 'thunder 'contained in the title field in Table 1 are returned. Since title is a text field and has a small length, the search speed is acceptable.
Now we want to return all records whose title or body field contains 'thunder:
Select * from T1 where title like '% thunder %' or body like '% thunder %' the speed is many times slower. This speed is used for Web search, will the user be patient?
How to solve the problem of speed in fuzzy search is the topic of full-text index that we will introduce today.
A full-text index can be used as an example to illustrate the problem:
Give you a book you haven't read. Specify a sentence in the book for you to search, and you cannot view the directory (index ), so you can only search for one page by page. Think about it. You don't know the content of this book. Will this search speed be faster?
If you are allowed to view the directory (INDEX), you may find the index faster (if the index is highly correlated with the index ). In addition, if you like this book very much and read it every day, you will be familiar with the content of the book for a while. At this time, let you find another article, can you immediately know the approximate location of this sentence? Search nearby Based on the approximate location, and then quickly locate the search result. In fact, you have already created a full-text index for this book in your mind.
SQL full-text index is used to pre-index database records by words, so as to speed up fuzzy search. These indexed words have a space between every two words in English, which may be different in Chinese. This involves the Chinese Word Segmentation technology. When we use the database engine, the engine's full-text index actually uses word segmentation technology. Of course, this cannot be seen on the surface.
The following uses ms SQL Server 2000 as an example to describe how to create a full-text index.
1. Start full-text indexing. Choose Microsoft SQL Server> SQL Server group> (local) (Windows NT)> Support Services> full-text search> right-click menu element> start;
2. Create a full-text directory for the database. Choose Microsoft SQL Server> SQL Server group> (local) (Windows NT)> database> full-text directory> right-side menu> new full-text directory;
3. Create a full-text index for the table to create a full-text index. Choose Microsoft SQL Server> SQL Server group> (local) (Windows NT)> database> Your Database Name> table> double-click> select the data table to create a full-text index in the list on the Right> right> full-text index table> on the table define full-text index, follow the Wizard to create a full-text index field and full-text directory. Note: Your data table must have a primary key. Otherwise, the following error occurs: "The selected table does not have a unique Single Column index on columns that do not allow null ".
4. Fill in the full-text directory selected when the full-text index was created in the previous step. Choose Microsoft SQL Server> SQL Server group> (local) (Windows NT)> database> full-text directory> double-click> select full-text directory from the list on the Right> right-built menu> start full filling. Note that filling takes time.
In this way, we can use contains to retrieve the full-text index of the data table:
select * from T1 where title like '% thunder %' or contains (body, '% thunder %') Try the query speed, the first query speed is still very slow. After the query is stored in the memory, the query speed is faster. It is not acceptable. However, the problem of slow query speed for the first time still needs to be solved, and the query speed in the future will be far from the baidu search speed. That is to say, there are still many improvements.