Full-text indexing in Oracle

Source: Internet
Author: User

Recently, one-click search is required for the entire site. Normally, Baidu or Google can be used for intra-site search on the Internet. Well ~ However, the search is inaccurate. But unfortunately, I am using a Business System on the Intranet. First, it is not generated in html, and second, it uses a large amount of ajax to dynamically load data, then it is not the data structure of the content field in the content table that cms agrees. So there are several problems.

1. Network crawling

2. Search implementation

3. Search speed

Then, the following implementation ideas are designed based on the available resources.

1. First, write a crawler that crawls the database based on the configured data table, fields, display names, and data storage format. The purpose is to crawl data from a business table that can be searched with one click to a data table in a predefined format.

2. To achieve high-performance search (of course, the like stream is not enough), the full-text Oracle index function is used here. Record the configuration process.

alter user ctxsys account unlock; grant execute on ctx_ddl to testuser;  
First, enable the full-text index function of oracle. In the PL/SQL create SQL window, run the preceding command. It is mainly to enable the ctxsys user (full-text index user), and then grant the ctx_ddl execution permission to the current user. (Able to perform full-text indexing)

Then create a word divider. The reason why full-text indexing can improve performance is that it avoids the use of like fuzzy matching. It allows you to create a word divider to perform word segmentation indexes based on the content of the index field. For example, Jinan City, Shandong Province will be split into Shandong Province, Shandong Province, Jinan City, and Jinan city and then indexed. (Of course, the effect varies depending on the word divider you use. If you use chinese_lexer, it may only be divided into Shandong Province and Jinan city.) In this way, when you perform a query, enter Shandong or Shandong province, this record is retrieved from both Jinan and Jinan. However, if you enter "Jinan Province", it is a pity that you cannot find it. This is different from like. However, this is generally not the case, unless it is "brainless ?" Haha.

The word divider is created as follows:

exec ctx_ddl.create_preference ('test_lexer', 'chinese_vgram_lexer');  
Here we use a comprehensive Chinese Word divider ~ You can test the word divider by yourself. The above can meet most of your needs.

NOTE: If PL/SQL is used for execution, remember to execute it in the new command window.

If necessary, you can also create filter phrases, which are the types of word segmentation that you do not want to appear when you create an index by using the system word segmentation tool. In this case, you can create a stoplist, as shown below:

exec ctx_ddl.create_stoplist('test_stoplist');  
Then, add a filter word to it,

Ctx_ddl.add_stopword ('test _ stoplist', 'Company ');
Ctx_ddl.add_stopword ('test _ stoplist', 'subauthorization ');

In this way, when creating an index, the "test Company" will not be broken down into two words: "test" and "company. Of course, when searching, you must enter "test Company" to find the result.

Then let's look at the statement for creating an index:

create index TEST_INDEX on TABLE_NAME(COL_NAME) indextype is CTXSYS.CONTEXT parameters('lexer test_lexer stoplist test_stoplist'); 
Of course, if the filter phrase is not used, the stoplist statement can be removed.

In this way, the index is created. The following describes how to query indexes.

Select * from TABLE_NAME where contains (COL_NAME, 'test')> 0;
The above method is used. It is actually very simple. 'Test' is the keyword we entered for the query. You can also perform word segmentation query on the test, just like Baidu.

However, if a new data insertion or data modification occurs, the index will not change? Bug? Of course not. Because full-text index requires a large amount of resources, you must manually execute the Index Update script during design. (Of course, automatic update is allowed. You can also add a parameter. However, this will seriously affect the performance, and only update the index without optimization, so you should manually update it.) The specific code is as follows:

Exec ctx_ddl.sync_index ('test _ Index') -- synchronize the index, that is, update the index exec ctx_ddl.optimize_index ('test _ Index', 'full') -- Optimize the index, similar to rebuilding the index, improves the query performance.
But when and how? It is estimated that you can write a job. I define execution at every morning.

This is almost the end. At present, the query speed through full-text indexing is quite advantageous, which is much faster than like. Please refer to it for your reference.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.