Oracle Full-text Search indexing, stored procedures, and Java method calls highlighting Ajax for Full-text search and server-side paging (original)
Research on Oracle Full-text Search (FULL)
2010-10-15 10:13:51| Category: Database | font size subscriptions
Reference Baidu Documentation:
Http://wenku.baidu.com/view/c53e9e36a32d7375a417801a.html
1. Preparation process
1.1 Checking and setting database roles
First check the database for Ctxsys users and ctxapp foot colors. Without this user and persona, it means that your database was created without the Intermedia feature installed. You must modify the database to install this feature. The Ctxsys user is locked in the default installation, so the Ctxsys user is enabled first.
The default Ctxsys user is locked and the password is immediately invalidated, so we enter EM with the SYS user and then modify the status and password of the Ctxsys user. As shown in figure:
1.2 Empowering
The test user takes the previously built Foo user as an example, taking the t_docnews under the user as an example
Log on as the SYS user DBA and assign Resource,connect permissions to Foo
GRANT resource, connect to Foo;
Then log in as a Ctxsys user and empower Foo users
GRANT Ctxapp to Foo;
GRANT execute on Ctxsys. Ctx_cls to Foo;
GRANT execute on Ctxsys. Ctx_ddl to Foo;
GRANT execute on Ctxsys. Ctx_doc to Foo;
GRANT execute on Ctxsys. Ctx_output to Foo;
GRANT execute on Ctxsys. Ctx_query to Foo;
GRANT execute on Ctxsys. Ctx_report to Foo;
GRANT execute on Ctxsys. Ctx_thes to Foo;
GRANT execute on Ctxsys. Ctx_ulexer to Foo;
View the system default Oracle text parameters
Select Pre_name, Pre_object from Ctx_preferences
2, Oracle Text indexing principle
The Oracle text index converts all characters in the text into tokens (token), as www.taobao.com converts
Into www,taobao,com such a mark.
ORACLE10G supports four types of indexes, Context,ctxcat,ctxrule,ctxxpath
2.1 Context Index
The Oracle text index converts all word to notation, and the context index schema is a reverse index (inverted
index), each of which maps to its own text position, such as the word dog may have the following entries
This means that dog has appeared in the document DOC1,DOC3,DOC5. After the index is built, the system will automatically generate
The following dr$myindex$i,dr$myindex$k,dr$myindex$r,dr$myindex$x,mytable5 tables (assuming the table is
MyTable, indexed as MYINDX). After the Dml operation, the context index is not automatically synchronized and needs to be exploited
Ctx_ddl.sync_index manually synchronize indexes.
Example:
Create table Docs (ID number primary key, text VARCHAR2 (200));
Insert into docs values (1, '
Insert into docs values (2, '
Insert into Docs values (3, '
Commit;
/
--Establishing a context index
Create index Idx_docs on docs (text)
Indextype is Ctxsys.context parameters
(' Filter ctxsys.null_filter section group Ctxsys.html_section_group ');
--Query
Column text Format A40; --The string is truncated to 40-bit display.
Select ID, text from docs where contains (text, ' France ') > 0;
ID text
---------- -------------------------------
3
2
--Continue inserting data
Insert into Docs values (4, '
Insert into docs values (5, '
Commit
Select ID, text from docs where contains (text, ' city ') > 0;--the newly inserted data is not queried
ID text
--------------------------------------------
2
--Index Sync
Begin
Ctx_ddl.sync_index (' Idx_docs ', ' 2m '); --Using 2M synchronous index
End
--Query
Column text Format A50;
Select ID, text from docs where contains (text, ' city ') > 0; --Find the data
ID text
-----------------------------------------------
5
4
2
--OR operator
Select ID, text from docs where contains (text, ' City or state ') > 0;
--and operator
Select ID, text from docs where contains (text, ' City and State ') > 0;
Or
Select ID, text from docs where contains (text, ' City State ') > 0;
--score means scoring, the higher the score, the more accurate the data found
SELECT SCORE (1), ID, text from docs WHERE CONTAINS (text, ' Oracle ', 1) > 0;
The index of the context type is not automatically synchronized, which requires manual synchronization of the index after DML. The query operator relative to the context index is contains
2.2 Ctxcat Index
Used in a multiple-column mixed query
Ctxcat can use index set to create an index set that adds some query columns that are often used with the CTXCAT query combination to the index set. For example, when you query a product name, you also need to query the production date, price, description, etc., you can add these columns to the index set. Oracle encapsulates these queries into Catsearch operations, thereby increasing the efficiency of full-text indexing. In some real-time transactions requiring higher transaction, the index of the context is not automatic synchronization is obviously a problem, Ctxcat will automatically synchronize the index
Example:
Create Table Auction (item_id number,title varchar2 (MB), category_id number,price number,bid_close date);
Insert into auction values (1, ' Nikon Camera ', 1, ' 24-oct-2002 ');
Insert into auction values (2, ' Olympus Camera ', 1, $, ' 25-oct-2002 ');
Insert into auction values (3, ' Pentax Camera ', 1, ' 26-oct-2002 ');
Insert into auction values (4, ' Canon Camera ', 1, ' 27-oct-2002 ');
Commit;
/
--Determine your query criteria (very important)
--determine that all queries search the title column for item descriptions
--Set up an index set
Begin
Ctx_ddl.create_index_set (' Auction_iset ');
Ctx_ddl.add_index (' Auction_iset ', ' price '); /* sub-index a*/
End
--Establish an index
Create index Auction_titlex on auction (title) Indextype is Ctxsys.ctxcat
Parameters (' index set Auction_iset ');
Column title format A40;
Select title, price from auction where Catsearch (title, ' Camera ', ' ORDER by Price ') > 0;
Title Price
--------------- ----------
Pentax Camera 200
Canon Camera 250
Olympus Camera 300
Nikon Camera 400
Insert into auction values (5, ' Aigo camera ', 1, ' 27-oct-2002 ');
Insert into auction values (6, ' Len Camera ', 1, ' 27-oct-2002 ');
Commit
/
--Test whether the index is synchronized automatically
Select title, price from auction where Catsearch (title, ' Camera ',
' Price <= ') >0;
Title Price
--------------- ----------
Aigo Camera 10
Len Camera 23
Add multiple subqueries to the index set:
Begin
Ctx_ddl.drop_index_set (' Auction_iset ');
Ctx_ddl.create_index_set (' Auction_iset ');
Ctx_ddl.add_index (' Auction_iset ', ' price '); /* sub-index A * *
Ctx_ddl.add_index (' Auction_iset ', ' Price, Bid_close '); /* sub-index B * *
End
Drop index Auction_titlex;
Create index Auction_titlex on auction (title) Indextype is Ctxsys.ctxcat
Parameters (' index set Auction_iset ');
SELECT * FROM Auction WHERE Catsearch (title, ' Camera ', ' Price = # by Bid_close ') >0;
SELECT * FROM Auction WHERE Catsearch (title, ' Camera ', ' ORDER by Price, Bid_close ') >0;
After any DML operation, the Ctxcat index is automatically synchronized and does not need to be executed manually, and the query operator corresponding to the Ctxcat index is catsearch.
Grammar:
Catsearch (
[Schema.] Column
Text_query VARCHAR2,
Structured_query VARCHAR2,
return number;
Example:
Catsearch (text, ' Dog ', ' foo > 15 ')
Catsearch (text, ' Dog ', ' bar = ' SMITH ')
Catsearch (text, ' Dog ', ' Foo between 1 and 15 ')
Catsearch (text, ' Dog ', ' foo = 1 and ABC = 123 ')
2.3 Ctxrule Index
The function of a classification application is to perform some action based on document content.
These actions can include assigning a category ID to a document or sending the document to a user.
The result is classification of a document.
Example:
Create table Queries (query_id number,query_string VARCHAR2 (80));
INSERT INTO queries values (1, ' Oracle ');
INSERT INTO queries values (2, ' Larry or Ellison ');
INSERT INTO queries values (3, ' Oracle and Text ');
INSERT INTO queries values (4, ' market share ');
Commit
Create index Queryx on Queries (query_string) Indextype is ctxsys.ctxrule;
Column query_string format A35;
Select query_id,query_string from queries
where matches (query_string,
' Oracle announced that it market share in databases
Increased over the last year. ') >0;
query_id query_string
---------- -----------------------------------
1 Oracle
4 Market share
Set up an index matching query in a sentence
2.4 Ctxxpath Index
Create This index the need to speed up existsnode () queries on a xmltype column
3. The internal processing process of the index
3.1 Datastore Properties
Data retrieval is responsible for storing data from data (such as Web pages, large database objects, or local file systems)
And then sent to the next stage as data. Datastore contains a type with direct datastore,
Multi_column_datastore, Detail_datastore, File_datastore, Url_datastore, User_datastore,
Nested_datastore.
3.1.1.Direct datastore
Supports the storage of data in a database, single-column query. No attributes Properties
Type of support: char, varchar, VARCHAR2, blob, CLOB, Bfile,or XmlType.
Example:
Create table MyTable (ID number primary key, Docs CLOB);
Insert into mytable values (111555, ' This text would be indexed ');
Insert into mytable values (111556, ' It is a direct_datastore example ');
Commit;
--Establish direct datastore
Create index Myindex on mytable (Docs)
Indextype is Ctxsys.context
Parameters (' datastore ctxsys.default_datastore ');
Select * FROM MyTable where contains (docs, ' text ') > 0;
3.1.2.multi_column_datastore
Applies to index data distributed across multiple columns
The column list is limited to bytes
Supports number and date types, which are converted to TEXTT before indexing
Raw and BLOB columns are directly concatenated as binary data.
Does not support long, long raw, nchar, and NCLOB, nested table
Create table Mytable1 (ID number primary key, Doc1 varchar2 (), Doc2 clob,doc3
CLOB);
Insert into Mytable1 values (1, ' This text is indexed ', ' following example creates Amulti-column ', ' denotes that bar Column ');
Insert into Mytable1 values (2, ' The ' is a direct_datastore example ', ' Use this datastore when your text be stored in the more th An one column ', ' The system concatenates the text columns ');
Commit;
/
--Establishing Multi Datastore type
Begin
Ctx_ddl.create_preference (' My_multi ', ' multi_column_datastore ');
Ctx_ddl.set_attribute (' My_multi ', ' columns ', ' Doc1, doc2, doc3 ');
End;
--Establish an index
Create index idx_mytable on Mytable1 (Doc1) Indextype is Ctxsys.context
Parameters (' datastore my_multi ')
Select * from Mytable1 where contains (Doc1, ' direct datastore ') >0;
Select * from Mytable1 where contains (Doc1, ' example creates ') >0;
Note: When retrieving, retrieval words must be meaningful words in English, for example,
Select * from Mytable1 where contains (Doc1, ' More than one column ') >0;
Can detect the second record, but you retrieve more will not show, because more in that sentence is not a meaningful word.
--Just update from the table to see if you can find the updated information
Update mytable1 Set doc2= ' Adladlhadad this datastore when your the text is stored test ' where
id=2;
Begin
Ctx_ddl.sync_index (' idx_mytable ');
End;
Select * from Mytable1 where contains (Doc1, ' Adladlhadad ') >0; --No record
Update Mytable1 Set doc1= ' This is a direct_datastore example ' where id=2; --Update primary table
Begin
Ctx_ddl.sync_index (' idx_mytable ');--Synchronous index
End;
Select * from Mytable1 where contains (Doc1, ' Adladlhadad ') >0; -Find updates from the table
A multiple-column Full-text index can be built on any column, but the column specified at query time must be specified with the index
Columns are consistent and only changes are made to the columns specified by the index, Oracle will think that the index data has changed and only modify
Other columns without modifying the indexed columns, even if the synchronization index does not synchronize the modifications to the index.
That is, only the indexed columns are updated so that the synchronization index takes effect, and you want to change the other columns and write them again.
In multiple columns, you can index any column, update the other columns, and synchronize the index one at a time to see the effect.
3.1.3 Detail_datastore
Apply to master-Detail table query (original: Use the Detail_datastore type for text stored directly in the database
Detail tables, with the indexed text column located in the master table)
Since it is really indexed from the columns on the table, it is not important to select that column of the primary table as the index, but after selecting, check
This column must be specified in the consultation condition
The contents of the indexed column in the primary table are not included in the index
Detail_datastore Property Definition
Example:
CREATE TABLE my_master– Create primary table
(article_id number primary key,author varchar2 (), title varchar2 (), Body varchar2 (1));
CREATE table my_detail– build from table
(article_id number, seq number, text VARCHAR2 (4000),
Constraint fr_id foreign KEY (article_id) references My_master (article_id));
--Analog data
INSERT into My_master values (1, ' Tom ', ' expert in and on ', 1);
INSERT into My_master values (2, ' Tom ', ' Expert Oracle Database architecture ', 2);
Commit
INSERT into my_detail values (1,1, ' Oracle'll find the Undo information for this transaction
Either in the cached
Undo segment Blocks (most likely) or on disk ');
INSERT into my_detail values (1,2, ' if they have been flushed (more likely for very large
transactions). ');
INSERT into my_detail values (1,3, ' lgwr are writing to a different device, then there is no
Contention for
Redo logs ');
INSERT into my_detail values (2,1, ' Many other databases treat the log files as ');
INSERT into my_detail values (2,2, ' for those systems, the act of rolling back can
Disastrous ');
Commit
--Establish detail datastore
Begin
Ctx_ddl.create_preference (' My_detail_pref ', ' detail_datastore ');
Ctx_ddl.set_attribute (' my_detail_pref ', ' binary ', ' true ');
Ctx_ddl.set_attribute (' My_detail_pref ', ' detail_table ', ' my_detail ');
Ctx_ddl.set_attribute (' My_detail_pref ', ' detail_key ', ' article_id ');
Ctx_ddl.set_attribute (' My_detail_pref ', ' Detail_lineno ', ' seq ');
Ctx_ddl.set_attribute (' My_detail_pref ', ' detail_text ', ' text ');
End
--Create an index
CREATE INDEX myindex123 on My_master (body) Indextype is Ctxsys.context
Parameters (' datastore my_detail_pref ');
SELECT * from My_master where contains (body, ' databases ') >0
--Just update the information from the table to see if you can find
Update My_detail Set text= ' undo is generated as a result of the DELETE, blocks are modified,
And redo is sent over to
The redo log buffer ' where article_id=2 and seq=1
Begin
Ctx_ddl.sync_index (' myindex123 ', ' 2m '); --Synchronizing indexes
End
SELECT * from My_master where contains (body, ' result of the ' DELETE ') >0– did not find the update just now
--Update primary table information after new from table
Update My_master set body=3 where body=2
Begin
Ctx_ddl.sync_index (' myindex123 ', ' 2m ');
End
SELECT * from My_master where contains (body, ' result of the ' DELETE ') >0– find data
If you update the indexed columns in the child table, you must update the primary table index columns to make Oracle aware that the indexed data has changed
(This can be implemented by triggers).
3.1.4 File_datastore
Applies to retrieving files on the local server (original: The File_datastore type is used to text stored in
Files accessed through the local file system.)
Multiple path identities: Unix under colons separated by semicolons like path1:path2:pathn Windows; separated by a semicolon;
CREATE TABLE Mytable3 (ID number primary key, Docs VARCHAR2 (2000));
INSERT into mytable3 values (111555, ' 1.txt ');
INSERT into mytable3 values (111556, ' 1.doc ');
Commit
--Create a file datastore
Begin
Ctx_ddl.create_preference (' Common_dir2 ', ' file_datastore ');
Ctx_ddl.set_attribute (' Common_dir2 ', ' PATH ', ' D:\search ');
End
--Establish an index
Create INDEX myindex3 on Mytable3 (Docs) indextype is ctxsys.context parameters (' datastore common_dir2 ');
SELECT * from Mytable3 where contains (docs, ' word ') >0; --Query
--Temporarily test support doc,txt
3.1.5 Url_datastore
For retrieving information on the Internet, you can only store the corresponding URL in the database
Example:
CREATE TABLE URLs (ID number primary key, Docs VARCHAR2 (2000));
INSERT into URLs values (111555, ' http://context.us.oracle.com ');
INSERT into URLs values (111556, ' http://www.sun.com ');
INSERT into URLs values (111557, ' http://www.itpub.net ');
INSERT into URLs values (111558, ' http://www.ixdba.com ');
Commit
/
--Create URL Datastore
Begin
Ctx_ddl.create_preference (' Url_pref ', ' url_datastore ');
Ctx_ddl.set_attribute (' Url_pref ', ' Timeout ', ' 300 ');
End
--Establish an index
Create index Datastores_text on URL (docs) indextype is ctxsys.context parameters
(' Datastore url_pref ');
SELECT * from URL where contains (docs, ' Aix ') >0
If the relevant URL does not exist, Oracle will not complain, only when the query can not find data.
In Oracle, only the URL address of the indexed document is saved, and if the document itself has changed, the index must be modified
Column (URL address column) to inform Oracle that the indexed data has changed.
3.1.6.user_datastore
Use the User_datastore type to define stored procedures that synthesize documents during
Indexing. For example, a user procedure might synthesize author, date, and text columns into one
Document to have the author and date information is part of the indexed text.
3.1