Oracle Fuzzy Query (5.3 Understanding the principle of Full-text indexing) Oracle Full-text Search Research (all) [main text]__oracle

Source: Internet
Author: User
Tags commit create index
Oracle Full-text Search indexing, stored procedures, and Java method calls highlighting Ajax for Full-text search and server-side paging (original) Research on Oracle Full-text Search (FULL)

2010-10-15 10:13:51| Category: Database | font size subscriptions

Reference Baidu Documentation:

Http://wenku.baidu.com/view/c53e9e36a32d7375a417801a.html

1. Preparation process

1.1 Checking and setting database roles

First check the database for Ctxsys users and ctxapp foot colors. Without this user and persona, it means that your database was created without the Intermedia feature installed. You must modify the database to install this feature. The Ctxsys user is locked in the default installation, so the Ctxsys user is enabled first.

The default Ctxsys user is locked and the password is immediately invalidated, so we enter EM with the SYS user and then modify the status and password of the Ctxsys user. As shown in figure:

1.2 Empowering

The test user takes the previously built Foo user as an example, taking the t_docnews under the user as an example

Log on as the SYS user DBA and assign Resource,connect permissions to Foo

GRANT resource, connect to Foo;

Then log in as a Ctxsys user and empower Foo users

GRANT Ctxapp to Foo;

GRANT execute on Ctxsys. Ctx_cls to Foo;

GRANT execute on Ctxsys. Ctx_ddl to Foo;

GRANT execute on Ctxsys. Ctx_doc to Foo;

GRANT execute on Ctxsys. Ctx_output to Foo;

GRANT execute on Ctxsys. Ctx_query to Foo;

GRANT execute on Ctxsys. Ctx_report to Foo;

GRANT execute on Ctxsys. Ctx_thes to Foo;

GRANT execute on Ctxsys. Ctx_ulexer to Foo;

View the system default Oracle text parameters

Select Pre_name, Pre_object from Ctx_preferences

2, Oracle Text indexing principle

The Oracle text index converts all characters in the text into tokens (token), as www.taobao.com converts

Into www,taobao,com such a mark.

ORACLE10G supports four types of indexes, Context,ctxcat,ctxrule,ctxxpath

2.1 Context Index

The Oracle text index converts all word to notation, and the context index schema is a reverse index (inverted

index), each of which maps to its own text position, such as the word dog may have the following entries

This means that dog has appeared in the document DOC1,DOC3,DOC5. After the index is built, the system will automatically generate

The following dr$myindex$i,dr$myindex$k,dr$myindex$r,dr$myindex$x,mytable5 tables (assuming the table is

MyTable, indexed as MYINDX). After the Dml operation, the context index is not automatically synchronized and needs to be exploited

Ctx_ddl.sync_index manually synchronize indexes.

Example:

Create table Docs (ID number primary key, text VARCHAR2 (200));

Insert into docs values (1, '

Insert into docs values (2, '

Insert into Docs values (3, '

Commit;

/

--Establishing a context index

Create index Idx_docs on docs (text)

Indextype is Ctxsys.context parameters

(' Filter ctxsys.null_filter section group Ctxsys.html_section_group ');

--Query

Column text Format A40; --The string is truncated to 40-bit display.

Select ID, text from docs where contains (text, ' France ') > 0;

ID text

---------- -------------------------------

3

2

--Continue inserting data

Insert into Docs values (4, '

Insert into docs values (5, '

Commit

Select ID, text from docs where contains (text, ' city ') > 0;--the newly inserted data is not queried

ID text

--------------------------------------------

2

--Index Sync

Begin

Ctx_ddl.sync_index (' Idx_docs ', ' 2m '); --Using 2M synchronous index

End

--Query

Column text Format A50;

Select ID, text from docs where contains (text, ' city ') > 0; --Find the data

ID text

-----------------------------------------------

5

4

2

--OR operator

Select ID, text from docs where contains (text, ' City or state ') > 0;

--and operator

Select ID, text from docs where contains (text, ' City and State ') > 0;

Or

Select ID, text from docs where contains (text, ' City State ') > 0;

--score means scoring, the higher the score, the more accurate the data found

SELECT SCORE (1), ID, text from docs WHERE CONTAINS (text, ' Oracle ', 1) > 0;

The index of the context type is not automatically synchronized, which requires manual synchronization of the index after DML. The query operator relative to the context index is contains

2.2 Ctxcat Index

Used in a multiple-column mixed query

Ctxcat can use index set to create an index set that adds some query columns that are often used with the CTXCAT query combination to the index set. For example, when you query a product name, you also need to query the production date, price, description, etc., you can add these columns to the index set. Oracle encapsulates these queries into Catsearch operations, thereby increasing the efficiency of full-text indexing. In some real-time transactions requiring higher transaction, the index of the context is not automatic synchronization is obviously a problem, Ctxcat will automatically synchronize the index

Example:

Create Table Auction (item_id number,title varchar2 (MB), category_id number,price number,bid_close date);

Insert into auction values (1, ' Nikon Camera ', 1, ' 24-oct-2002 ');

Insert into auction values (2, ' Olympus Camera ', 1, $, ' 25-oct-2002 ');

Insert into auction values (3, ' Pentax Camera ', 1, ' 26-oct-2002 ');

Insert into auction values (4, ' Canon Camera ', 1, ' 27-oct-2002 ');

Commit;

/

--Determine your query criteria (very important)

--determine that all queries search the title column for item descriptions

--Set up an index set

Begin

Ctx_ddl.create_index_set (' Auction_iset ');

Ctx_ddl.add_index (' Auction_iset ', ' price '); /* sub-index a*/

End

--Establish an index

Create index Auction_titlex on auction (title) Indextype is Ctxsys.ctxcat

Parameters (' index set Auction_iset ');

Column title format A40;

Select title, price from auction where Catsearch (title, ' Camera ', ' ORDER by Price ') > 0;

Title Price

--------------- ----------

Pentax Camera 200

Canon Camera 250

Olympus Camera 300

Nikon Camera 400

Insert into auction values (5, ' Aigo camera ', 1, ' 27-oct-2002 ');

Insert into auction values (6, ' Len Camera ', 1, ' 27-oct-2002 ');

Commit

/

--Test whether the index is synchronized automatically

Select title, price from auction where Catsearch (title, ' Camera ',

' Price <= ') >0;

Title Price

--------------- ----------

Aigo Camera 10

Len Camera 23

Add multiple subqueries to the index set:

Begin

Ctx_ddl.drop_index_set (' Auction_iset ');

Ctx_ddl.create_index_set (' Auction_iset ');

Ctx_ddl.add_index (' Auction_iset ', ' price '); /* sub-index A * *

Ctx_ddl.add_index (' Auction_iset ', ' Price, Bid_close '); /* sub-index B * *

End

Drop index Auction_titlex;

Create index Auction_titlex on auction (title) Indextype is Ctxsys.ctxcat

Parameters (' index set Auction_iset ');

SELECT * FROM Auction WHERE Catsearch (title, ' Camera ', ' Price = # by Bid_close ') >0;

SELECT * FROM Auction WHERE Catsearch (title, ' Camera ', ' ORDER by Price, Bid_close ') >0;

After any DML operation, the Ctxcat index is automatically synchronized and does not need to be executed manually, and the query operator corresponding to the Ctxcat index is catsearch.

Grammar:

Catsearch (

[Schema.] Column

Text_query VARCHAR2,

Structured_query VARCHAR2,

return number;

Example:

Catsearch (text, ' Dog ', ' foo > 15 ')

Catsearch (text, ' Dog ', ' bar = ' SMITH ')

Catsearch (text, ' Dog ', ' Foo between 1 and 15 ')

Catsearch (text, ' Dog ', ' foo = 1 and ABC = 123 ')

2.3 Ctxrule Index

The function of a classification application is to perform some action based on document content.

These actions can include assigning a category ID to a document or sending the document to a user.

The result is classification of a document.

Example:

Create table Queries (query_id number,query_string VARCHAR2 (80));

INSERT INTO queries values (1, ' Oracle ');

INSERT INTO queries values (2, ' Larry or Ellison ');

INSERT INTO queries values (3, ' Oracle and Text ');

INSERT INTO queries values (4, ' market share ');

Commit

Create index Queryx on Queries (query_string) Indextype is ctxsys.ctxrule;

Column query_string format A35;

Select query_id,query_string from queries

where matches (query_string,

' Oracle announced that it market share in databases

Increased over the last year. ') >0;

query_id query_string

---------- -----------------------------------

1 Oracle

4 Market share

Set up an index matching query in a sentence

2.4 Ctxxpath Index

Create This index the need to speed up existsnode () queries on a xmltype column

3. The internal processing process of the index

3.1 Datastore Properties

Data retrieval is responsible for storing data from data (such as Web pages, large database objects, or local file systems)

And then sent to the next stage as data. Datastore contains a type with direct datastore,

Multi_column_datastore, Detail_datastore, File_datastore, Url_datastore, User_datastore,

Nested_datastore.

3.1.1.Direct datastore

Supports the storage of data in a database, single-column query. No attributes Properties

Type of support: char, varchar, VARCHAR2, blob, CLOB, Bfile,or XmlType.

Example:

Create table MyTable (ID number primary key, Docs CLOB);

Insert into mytable values (111555, ' This text would be indexed ');

Insert into mytable values (111556, ' It is a direct_datastore example ');

Commit;

--Establish direct datastore

Create index Myindex on mytable (Docs)

Indextype is Ctxsys.context

Parameters (' datastore ctxsys.default_datastore ');

Select * FROM MyTable where contains (docs, ' text ') > 0;

3.1.2.multi_column_datastore

Applies to index data distributed across multiple columns

The column list is limited to bytes

Supports number and date types, which are converted to TEXTT before indexing

Raw and BLOB columns are directly concatenated as binary data.

Does not support long, long raw, nchar, and NCLOB, nested table

Create table Mytable1 (ID number primary key, Doc1 varchar2 (), Doc2 clob,doc3

CLOB);

Insert into Mytable1 values (1, ' This text is indexed ', ' following example creates Amulti-column ', ' denotes that bar Column ');

Insert into Mytable1 values (2, ' The ' is a direct_datastore example ', ' Use this datastore when your text be stored in the more th An one column ', ' The system concatenates the text columns ');

Commit;

/

--Establishing Multi Datastore type

Begin

Ctx_ddl.create_preference (' My_multi ', ' multi_column_datastore ');

Ctx_ddl.set_attribute (' My_multi ', ' columns ', ' Doc1, doc2, doc3 ');

End;

--Establish an index

Create index idx_mytable on Mytable1 (Doc1) Indextype is Ctxsys.context

Parameters (' datastore my_multi ')

Select * from Mytable1 where contains (Doc1, ' direct datastore ') >0;

Select * from Mytable1 where contains (Doc1, ' example creates ') >0;

Note: When retrieving, retrieval words must be meaningful words in English, for example,

Select * from Mytable1 where contains (Doc1, ' More than one column ') >0;

Can detect the second record, but you retrieve more will not show, because more in that sentence is not a meaningful word.

--Just update from the table to see if you can find the updated information

Update mytable1 Set doc2= ' Adladlhadad this datastore when your the text is stored test ' where

id=2;

Begin

Ctx_ddl.sync_index (' idx_mytable ');

End;

Select * from Mytable1 where contains (Doc1, ' Adladlhadad ') >0; --No record

Update Mytable1 Set doc1= ' This is a direct_datastore example ' where id=2; --Update primary table

Begin

Ctx_ddl.sync_index (' idx_mytable ');--Synchronous index

End;

Select * from Mytable1 where contains (Doc1, ' Adladlhadad ') >0; -Find updates from the table

A multiple-column Full-text index can be built on any column, but the column specified at query time must be specified with the index

Columns are consistent and only changes are made to the columns specified by the index, Oracle will think that the index data has changed and only modify

Other columns without modifying the indexed columns, even if the synchronization index does not synchronize the modifications to the index.

That is, only the indexed columns are updated so that the synchronization index takes effect, and you want to change the other columns and write them again.

In multiple columns, you can index any column, update the other columns, and synchronize the index one at a time to see the effect.

3.1.3 Detail_datastore

Apply to master-Detail table query (original: Use the Detail_datastore type for text stored directly in the database

Detail tables, with the indexed text column located in the master table)

Since it is really indexed from the columns on the table, it is not important to select that column of the primary table as the index, but after selecting, check

This column must be specified in the consultation condition

The contents of the indexed column in the primary table are not included in the index

Detail_datastore Property Definition

Example:

CREATE TABLE my_master– Create primary table

(article_id number primary key,author varchar2 (), title varchar2 (), Body varchar2 (1));

CREATE table my_detail– build from table

(article_id number, seq number, text VARCHAR2 (4000),

Constraint fr_id foreign KEY (article_id) references My_master (article_id));

--Analog data

INSERT into My_master values (1, ' Tom ', ' expert in and on ', 1);

INSERT into My_master values (2, ' Tom ', ' Expert Oracle Database architecture ', 2);

Commit

INSERT into my_detail values (1,1, ' Oracle'll find the Undo information for this transaction

Either in the cached

Undo segment Blocks (most likely) or on disk ');

INSERT into my_detail values (1,2, ' if they have been flushed (more likely for very large

transactions). ');

INSERT into my_detail values (1,3, ' lgwr are writing to a different device, then there is no

Contention for

Redo logs ');

INSERT into my_detail values (2,1, ' Many other databases treat the log files as ');

INSERT into my_detail values (2,2, ' for those systems, the act of rolling back can

Disastrous ');

Commit

--Establish detail datastore

Begin

Ctx_ddl.create_preference (' My_detail_pref ', ' detail_datastore ');

Ctx_ddl.set_attribute (' my_detail_pref ', ' binary ', ' true ');

Ctx_ddl.set_attribute (' My_detail_pref ', ' detail_table ', ' my_detail ');

Ctx_ddl.set_attribute (' My_detail_pref ', ' detail_key ', ' article_id ');

Ctx_ddl.set_attribute (' My_detail_pref ', ' Detail_lineno ', ' seq ');

Ctx_ddl.set_attribute (' My_detail_pref ', ' detail_text ', ' text ');

End

--Create an index

CREATE INDEX myindex123 on My_master (body) Indextype is Ctxsys.context

Parameters (' datastore my_detail_pref ');

SELECT * from My_master where contains (body, ' databases ') >0

--Just update the information from the table to see if you can find

Update My_detail Set text= ' undo is generated as a result of the DELETE, blocks are modified,

And redo is sent over to

The redo log buffer ' where article_id=2 and seq=1

Begin

Ctx_ddl.sync_index (' myindex123 ', ' 2m '); --Synchronizing indexes

End

SELECT * from My_master where contains (body, ' result of the ' DELETE ') >0– did not find the update just now

--Update primary table information after new from table

Update My_master set body=3 where body=2

Begin

Ctx_ddl.sync_index (' myindex123 ', ' 2m ');

End

SELECT * from My_master where contains (body, ' result of the ' DELETE ') >0– find data

If you update the indexed columns in the child table, you must update the primary table index columns to make Oracle aware that the indexed data has changed

(This can be implemented by triggers).

3.1.4 File_datastore

Applies to retrieving files on the local server (original: The File_datastore type is used to text stored in

Files accessed through the local file system.)

Multiple path identities: Unix under colons separated by semicolons like path1:path2:pathn Windows; separated by a semicolon;

CREATE TABLE Mytable3 (ID number primary key, Docs VARCHAR2 (2000));

INSERT into mytable3 values (111555, ' 1.txt ');

INSERT into mytable3 values (111556, ' 1.doc ');

Commit

--Create a file datastore

Begin

Ctx_ddl.create_preference (' Common_dir2 ', ' file_datastore ');

Ctx_ddl.set_attribute (' Common_dir2 ', ' PATH ', ' D:\search ');

End

--Establish an index

Create INDEX myindex3 on Mytable3 (Docs) indextype is ctxsys.context parameters (' datastore common_dir2 ');

SELECT * from Mytable3 where contains (docs, ' word ') >0; --Query

--Temporarily test support doc,txt

3.1.5 Url_datastore

For retrieving information on the Internet, you can only store the corresponding URL in the database

Example:

CREATE TABLE URLs (ID number primary key, Docs VARCHAR2 (2000));

INSERT into URLs values (111555, ' http://context.us.oracle.com ');

INSERT into URLs values (111556, ' http://www.sun.com ');

INSERT into URLs values (111557, ' http://www.itpub.net ');

INSERT into URLs values (111558, ' http://www.ixdba.com ');

Commit

/

--Create URL Datastore

Begin

Ctx_ddl.create_preference (' Url_pref ', ' url_datastore ');

Ctx_ddl.set_attribute (' Url_pref ', ' Timeout ', ' 300 ');

End

--Establish an index

Create index Datastores_text on URL (docs) indextype is ctxsys.context parameters

(' Datastore url_pref ');

SELECT * from URL where contains (docs, ' Aix ') >0

If the relevant URL does not exist, Oracle will not complain, only when the query can not find data.

In Oracle, only the URL address of the indexed document is saved, and if the document itself has changed, the index must be modified

Column (URL address column) to inform Oracle that the indexed data has changed.

3.1.6.user_datastore

Use the User_datastore type to define stored procedures that synthesize documents during

Indexing. For example, a user procedure might synthesize author, date, and text columns into one

Document to have the author and date information is part of the indexed text.

3.1

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.