How to create a full-text index in Oracle

Last Update:2018-07-02 Source: Internet

Author: User

Tags lexer

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I believe everyone has some knowledge about Oracle full-text indexing. The following describes how to create an oracle full-text index. If you are interested in oracle full-text indexing, take a look.

Step 1 Check and set database roles
First, check whether there are CTXSYS users and CTXAPP roles in the database. If you do not have this user or role, it means that the intermedia function is not installed when your database is created. You must modify the database to install this function. By default, the ctxsys user is locked, so you must enable the ctxsys user first.

Step 2 grant permissions
Grant the execution permission of ctx_ddl to the user who wants to use the oracle full-text index under the ctxsys user, for example:

Grant execute on ctx_ddl to pomoho;

Step 3 set the lexical analyzer (lexer)
The full-text retrieval mechanism of Oracle is actually very simple. The Oracle patented lexical analyzer (lexer) is used to find all ideographic units (Oracle called term) in the article and record them in a group of tables starting with dr $, at the same time, write down the location, number of times, and hash value of the term. During retrieval, Oracle searches for the corresponding term from this table and calculates the frequency of occurrence. Based on an algorithm, it calculates the score (score) of each document, which is called the 'matching rate '. Lexer is the core of this mechanism, which determines the efficiency of full-text retrieval. Oracle provides different lexer for different languages, and we can usually use three of them:

N basic_lexer: for English. It can separate English words from sentences based on Spaces and punctuations, and automatically treat words that have lost retrieval meaning frequently as 'spam ', such as if, is and so on, with high processing efficiency. However, the lexer has many problems when used in Chinese. Because it only recognizes space and punctuation, and generally does not contain spaces in a Chinese sentence, it regards the entire sentence as a term, in fact, the retrieval capability is lost. Taking the phrase 'Chinese people stood up' as an example, the result of the basic_lexer analysis is only one term, that is, 'Chinese people stood up '. If 'China' is retrieved, NO content is retrieved.

N-grams: specialized Chinese analyzer that supports all Chinese character sets (ZHS16CGB231280 ZHS16GBK ZHT32EUC ZHT16BIG5 ZHT32TRIS ZHT16MSWIN950 ZHT16HKSCS UTF8 this analyzer analyzes Chinese sentences in units of words. The Chinese people stood up. This sentence will be analyzed into the following terms: medium, Chinese, Chinese, and people ',, 'People station ', 'standing up', getting up ', 'lai', 'gone '. It can be seen that this analysis method is easy to implement and can achieve 'all-in-One nets', but the efficiency is unsatisfactory.

N chinese_lexer: this is a new Chinese analyzer that only supports the utf8 character set. As we can see above, the analyzer of chinese vgram lexer does not know commonly used chinese words, so the analysis unit is very mechanical, like the above 'people station ', the term "Start Up" does not appear separately in Chinese. Therefore, this term is meaningless and affects efficiency. The biggest improvement of chinese_lexer is that the analyzer can recognize most of the commonly used Chinese vocabulary, so it can analyze sentences more efficiently. The above two stupid units will not appear again, greatly improving the efficiency. However, it only supports utf8. If your database is in the zhs16gbk character set, you can only use the stupid Chinese vgram lexer.

If no settings are made, Oracle uses the basic_lexer analyzer by default. To specify which lexer to use, perform the following operations:

1. Create a preference under the current user (for example, execute the following statement under the pomoho user)

Exec ctx_ddl.create_preference ('My _ lexer ', 'Chinese _ vgram_lexer ');

2. Specify the lexer used when creating an oracle full-text index:

Create index myindex ON mytable (mycolumn) indextype is ctxsys. context

Parameters ('lexer my_lexer ');

In this way, chinese_vgram_lexer is used as the analyzer.

Step 4 Create an index

Use the following syntax to create a full-text index
Create index [schema.] index on [schema.] table (column) indextype is ctxsys. context [ONLINE] LOCAL [(PARTITION [partition] [PARAMETERS ('paramstring')] [, PARTITION [partition] [PARAMETERS ('paramstring')] [PARAMETERS (paramstring)] [PARALLEL n] [UNUSABLE];
Example:
Create index ctx_idx_menuname ON pubmenu (menuname) indextype is ctxsys. context parameters ('lexer my_lexer ')

Step 5 Use Indexes

Full-text index is easy to use. You can use:

Select * from pubmenu where contains (menuname, 'upload image')> 0 full-text index type
The created Oracle Text index is called domain index, which includes four index types:

L CONTEXT

L CTXCAT

L CTXRULE

L CTXXPATH

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More