As we all know, fuzzy search using like is very slow, including like 'aaa % ', like' % aaa', like '% AAA % ', like '% A %' and fuzzy queries that use '_' for single-character matching. There are many Article We will talk about how to improve like queries. we mentioned that like 'aaa % 'can use indexes, while like' % aaa' can use reverse function indexes to improve query efficiency. However, in general, it is impossible to determine which kind of like query is used by the client. Is it hard to say that if is used to determine all these cases?
My head hurt this thing countless times. Recently, a customer "unreasonably" requested that the fuzzy query of user addresses be too slow. Query records under 100,000 of user records requires that records be queried within 5 seconds.
If you want to break your head, you still cannot find a solution. Some colleagues found this Lucene book and showed it to me, saying it could be solved. I read it three or three times and never figured out how to use it.
I suddenly thought that Oracle also had a full-text index, which was previously mentioned by others. A chat with a friend on the Internet seems to be able to solve the problem, but he forgot how to use it.
Get up at in the middle of the night and go to Google to check the information. I got some help after several hours of research. The next day, I did not have time to study it during the day, and continued at night. I finally completed the full-text index, solving the problem of slow fuzzy query speed. Among the hundreds of thousands of user data entries, fuzzy search of user addresses can be found within 2 seconds.
-------------------------------------------------------------------------
The following describes how to create a full-text index. (I used a graphical interface to create a full-text index on the Internet. I created a full-text index one night. The next day I got up, I was not finished. However, you can run the SQL command in about 15 minutes ):
Perform full-text search for the address field in the cmng_custominfo table:
1. Create a word segmentation item in oracle9201:
Begin
Ctx_ddl.create_preference ('sms _ address_lexer ', 'Chinese _ lexer ');
-- Ctx_ddl.create_preference ('My _ lexer ', 'Chinese _ vgram_lexer'); No
End;
2. Create a full-text search:
Create Index Inx_custominfo_addr_docs On Cmng_custominfo (address) Indextype is ctxsys. context parameters ('lexer sms_address_lexer ');
3. When querying, use:
Select * From cmng_custominfo where Contains (address, 'golden Newtown ')> 1 ;
4. Regular synchronization and optimization are required:
Synchronization: update the full-text search index based on the text content of the new record.
Begin
Ctx_ddl.sync_index ( 'Your _ minminfo_addr_docs' );
End;
Optimization: Clear junk content in full-text search indexes based on deleted records
Begin
Ctx_ddl.optimize_index ( 'Prop _ minminfo_addr_docs ', 'save' );
End;
-------------------------------------------------------------------------
5. Use job to do the work in Step 4:
1) This function must be completed using the Oracle job function.
Because Oracle9i does not enable the job function by default, you must first add the job configuration parameters of the Oracle database instance:
Job_queue_processes = 5
Restart the Oracle Database Service and listener service.
2) synchronization and Optimization
-- Synchronize Sync:
Variable jobno number;
Begin
Dbms_job.submit (: jobno, 'ctx _ DDL. sync_index (''your _ minminfo_addr_docs ''); ', sysdate, 'sysdate + (1/24/4 )');
Commit;
End;
-- Optimization
Variable jobno number;
Begin
Dbms_job.submit (: jobno, 'ctx _ DDL. optimize_index (''inx _ minminfo_addr_docs '', ''full''); ', sysdate, 'sysdate + 1 ');
Commit;
End;
The sysdate + (1/24/4) of the first job is synchronized every 15 minutes. The sysdate + 1 of the second job is fully optimized every one day. The specific time interval can be determined based on the application's needs.
6. Index Reconstruction
Re-indexing will delete the original index and regenerate the index, which takes a long time.
The index reconstruction syntax is as follows:
Alter index inx_custominfo_addr_docs rebuild;
According to the experiences of some users on the Internet, the reindexing speed of Oracle is also relatively fast, as described by one user:
The establishment and maintenance of full-text search in Oracle is much faster than that in ms SQL Server. It takes 20 minutes to create an index for a table recorded in 0.65 million, and only 1 minute for synchronization.
Therefore, you can also consider using the job method to regularly rebuild indexes.
For example:
0. Use sys to change the ctxsys Password
1. Authorization
Grant the execution permission of ctx_ddl to the user who wants to use the full-text index under the ctxsys user, for example:
Grant execute on ctx_ddl to csdba;
2. Create a word divider:
Begin
Ctx_ddl.create_preference ('My _ lexer ', 'Chinese _ lexer ');
End;
Log on to csdba
3. Create a full-text search:
Create index it_remark_index on itskills (remark) indextype is ctxsys. context parameters ('lexer my_lexer ')
4. Modify SQL statements
5. Synchronization
Begin
Ctx_ddl.sync_index ('It _ remark_index ');
End;
6. Optimization
Begin
Ctx_ddl.optimize_index ('It _ remark_index ', 'quick ');
End;
7. troubleshooting
(1) ORA-29855: Execute odciindexcreate routineProgramError
A ORA-20000: Oracle Text
Error:
DRG-10509: Invalid text column: ID
ORA-06512:
In "ctxsys. Drue", line 160
ORA-06512:
In "ctxsys. textindexmethods", line 364
View the program source of the error stack?
Cause: Oracle only supports full-text indexing on varchar, varchar2, clob, and long types.
(2)
8. Delete
// Delete the index
Drop index it_remark_index Delete Index
// If an index cannot be deleted, add force after the index is deleted.
For example, drop index it_remark_index force forcibly deletes an index.
// Delete the word Divider
Begin
Ctx_ddl.drop_preference ('My _ lexer ');
End;