How to make better use of Oracle Full-text Search

Source: Internet
Author: User
Tags character set lexer oracle database

There are also many ways to search for text in an Oracle database without using the Oracle text feature. You can use the standard InStr function and the LIKE operator.

SELECT *FROM mytext WHERE INSTR (thetext,  'Oracle') > 0;
SELECT * FROM mytext WHERE thetext LIKE '% Oracle%'; 

There are many times, using InStr and like is ideal, especially when searching for only a small table. However, this method of positioning through these text will result in a full table scan, which is expensive for resources, and has very limited search capabilities, so when searching for massive amounts of text data, It is recommended to use the Full-text search functionality provided by Oralce

To establish a full-text search

Step one checks and sets the database role to first check the database for Ctxsys users and ctxapp foot colors. Without this user and persona, it means that your database was created without the Intermedia feature installed. You must modify the database to install this feature.

The Ctxsys user is locked in the default installation, so the Ctxsys user is enabled first.

Step two assigns the execution permission of the CTX_DDL to the user who will use the Full-text index under the Ctxsys user, for example:

grant execute on ctx_ddl  to pomoho;

Step three to set the lexical analyzer (lexer)

Oracle to achieve Full-text search, the mechanism is actually very simple. That is, through Oracle's patented lexical Analyzer (lexer), all of the ideographic units in the article (Oracle called term) are found, recorded in a set of tables that begin with dr$, and note the location, number, hash value, etc. of the term. When retrieving, Oracle looks for the corresponding term from this set of tables and calculates the frequency of its occurrence, calculating the score (score) of each document according to an algorithm, known as the ' match rate '. And Lexer is the core of the mechanism, which determines the efficiency of full-text search. Oracle offers different lexer for different languages, and we can usually use three of them:

N basic_lexer: for English. It can separate the English words from the sentences according to the spaces and punctuation, and can automatically treat some words that have lost the retrieval meaning by the high frequency, such as ' rubbish ' processing, such as if, is, etc., with high processing efficiency. However, the lexer has many problems in Chinese, because it only recognize spaces and punctuation, and Chinese words usually do not have spaces, therefore, it will be the whole sentence as a term, in fact, lost the ability to retrieve. With the words ' Chinese people stand up ', for example, the result of basic_lexer analysis is only one term, that is ' Chinese people stand up '. If you retrieve ' China ' at this time, the content will not be retrieved.

N Chinese_vgram_lexer: A specialized Chinese parser that supports all Chinese character sets (zhs16cgb231280 zhs16gbk Zht32euc zht16big5 Zht32tris Zht16hkscs UTF8). The analyzer analyzes Chinese sentences by word-unit. ' The Chinese people stand up ' this sentence, will be analyzed as follows several term: ' Middle ', ' China ', ' Chinese ', ' People ', ' the People's Station ', ' Stand up ', get up ', ' Come ', '. It can be seen that this method of analysis, the implementation of the algorithm is simple, and can achieve ' catch ', but the efficiency is not satisfactory.

N Chinese_lexer: This is a new Chinese parser that only supports the UTF8 character set. Above has seen, Chinese vgram lexer This parser because does not know commonly used Chinese vocabulary, therefore the analysis Unit is very mechanical, like above ' The People Station ', ' stands ' in the Chinese does not appear alone, therefore this kind of term is meaningless, but affects the efficiency. Chinese_lexer's biggest improvement is that the analyzer can understand most of the commonly used Chinese vocabulary, so it can be more efficient to analyze sentences, like the above two stupid units will not appear again, greatly improve the efficiency. But it only supports UTF8, if your database is ZHS16GBK character set, you can only use the stupid Chinese vgram lexer.

If you do not make any settings, Oracle defaults to using the Basic_lexer parser. To specify which lexer to use, you can do this:

First Create a preference under the current user (example: Execute the following statement under Pomoho user)

exec  ctx_ddl.create_preference ('my_lexer',  'chinese_vgram_lexer');

Second When you establish a Full-text indexing index, indicate the lexer you are using:

CREATE INDEX myindex ON mytable(mycolumn) indextype is  ctxsys.context

parameters('lexer my_lexer');

This creates a Full-text search index that uses chinese_vgram_lexer as the parser.

Step four to establish an index

Create a Full-text index with the following syntax

CREATE INDEX [schema.]index on [schema.]table(column)  INDEXTYPE IS ctxsys.context [ONLINE]

LOCAL [(PARTITION [partition]  [PARAMETERS('paramstring')]

[, PARTITION [partition] [PARAMETERS ('paramstring')]])]

[PARAMETERS(paramstring)] [PARALLEL n]  [UNUSABLE];

Cases:

CREATE INDEX ctx_idx_menuname ON pubmenu (menuname)

indextype is ctxsys.context parameters('lexer my_lexer')

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.