This article will explain how to customize the dictionary initialization data createtabletest2 (str1varchar2 (2000), str2varchar2 (2000); insertintotest2values (geological map, China, and feedback ); insertintotest2values (image, figure); commit;
This article explains how to customize the dictionary initialization data of the chinese_lexer analyzer. create table test2 (str1 varchar2 (2000), str2varchar2 (2000); insert into test2 values (geological map, china and ZHUSHAN tornado drool geological map); insert into test2 values (image, figure); commit;
This article describes how to customize the dictionary of chinese_lexer.
Initialize data
Create table test2 (str1 varchar2 (2000), str2varchar2 (2000); insert into test2 values '); insert into test2 values ('image', 'fig'); commit;
Create this method analyzer and create full-text indexes (note that the dictionary only works for chinese_lexer)
exec ctx_ddl.create_preference('my_lexer1','CHINESE_LEXER'); EXEC ctx_ddl.create_preference('dataquery','MULTI_COLUMN_DATASTORE');EXEC ctx_ddl.set_attribute('dataquery','columns', 'str1,str2'); CREATE INDEX test2_idx ON test2(str1) INDEXTYPEIS ctxsys.CONTEXT PARAMETERS('datastore dataquery LEXER my_lexer1');
Looking at the Word Table generated, we can see that there is no keyword of geological map.
Ctx @ STARTREK> select * from DR $ TEST2_IDX $ I; TOKEN_TEXT TOKEN_TYPE TOKEN_FIRST TOKEN_LAST TOKEN_COUNT amount ---------------- --------- TOKEN_INFO-----------------------------------------------------------------------------------------------------------------------------------------------------STR1 0 1 2 20090010301900102 STR2 0 1 2 20090050B01900402 geological 0 1 10090020C feedback 0 1 1008808 and 0 1 1008807 saliva 0 1 1 100880D stream 0 1 1 100880C tornado 0 1 1 100880B Mountain 0 1 100880A Diagram 0 1 2 20090030C018805 0 2 1008802 China 0 1 1 1008806 pianshan 0 1 1008809 already selected 13 rows.
Start using the custom dictionary
C:\Users\fengjun>ctxlc -zht -ocs zhs16GBK> zhs16gbk_102.txt C:\Users\fengjun>zhs16gbk_102.txt
I searched the document and found no keyword for geological map.
Add geological map at the end
Three files ending with d, k, and I used to generate a custom dictionary
Errors always occur here
C: \ Users \ fengjun> ctxlc-zht-ics zhs16gbk -izhs16gbk_102.txt DRG-52107: ctxkbtc internal error plus-n parameters, successfully generated C: \ Users \ fengjun> ctxlc-zht-ics zhs16gbk-n-I zhs16gbk_102.txt ,,,,,,,,,,,,,,,,,,,,,,,, ,,,,,,,,,,,,,,,,,,,,,,,,, DRG-52118: Writing index file for termsDRG-52117: Writing index file for IDsDRG-52116: Done writing all termsDRG-52115: Writing new terms in lexicon tofilesDRG-52114: Writing lexicon to files C: \ Users \ fengjun> the volume in the dir dr * drive C is the serial number of the Windows8_ OS volume, which is 6C5D-2B1F C: \ Users \ fengjun directory 2,250,471 drold. dat2014/09/24 391,326 droli. dat2014/09/24 89,282 drolk. dat2014/09/24 298,206 drolt. dat 4 files, 3,029,285 bytes, 0 directories, 113,255,260,160 available bytes
Back up $ ORACLE_HOME \ ctx \ data \ zhlx
And copy the above files to $ ORACLE_HOME \ ctx \ data \ zhlx, and rename it
Copy the end of d, k, and I.
Remember to back up the original file.
Ctx @ STARTREK> drop index test2_idx force; the index has been deleted. Ctx @ STARTREK> create index test2_idx ONtest2 (str1) indextype is ctxsys. context parameters ('datastore dataquery LEXERmy_lexer1 '); the INDEX has been created. Ctx @ STARTREK> select * from DR $ TEST2_IDX $ I; TOKEN_TEXT TOKEN_TYPE TOKEN_FIRST TOKEN_LAST TOKEN_COUNT amount ---------------- ----------- TOKEN_INFO--------------------------------------------------------------------------------------------------------------------------STR1 0 1 2 20090010201900102 STR2 0 1 2 20090040A01900402 Quality chart 0 1 1 10090020B feedback 0 1 1008807 and 0 1 1008806 saliva 0 1 1 100880C stream 0 1 1 100880B tornado 0 1 1 1 100880A mountains 0 1 1 1008809 figure 0 2 2 1008805 picture 0 2 1008802 China 0 1 1 1008805 Lushan 0 1 1 1008808 selected 13 rows.
You can see that the keyword "geological map" already exists.
In this way, the User-Defined dictionary is complete. for the retrieval of large data volumes, the User-Defined dictionary is very meaningful.
The following is a simple test.
Before adding a keyword
SQL> select count (0) from data_query t wherecontains (MDTITILE, 'xxxx million geological map ')> 0;
COUNT (0)
----------
7072
Used time: 00: 00: 01.54
Execution Plan
----------------------------------------------------------
Plan hash value: 670767155
--------------------------------------------------------------------------------
---
| Id | Operation | Name | Rows | Bytes | Cost (% CPU) | Time
|
--------------------------------------------------------------------------------
---
| 0 | select statement | 1 | 85 | 4 (0) | 00:00:0
1 |
| 1 | sort aggregate | 1 | 85 |
|
| * 2 | domain index | DATA_QUERY_IDX | 164 | 13940 | 4 (0) |
1 |
--------------------------------------------------------------------------------
---
Predicate Information (identified by operationid ):
--------------------------------------------------
2-access ("CTXSYS". "CONTAINS" ("MDTITILE", 'xxxx million geological map ')> 0)
Statistics
----------------------------------------------------------
3528 recursive cballs
0 db block gets
10214 consistent gets
1173 physical reads
2824 redo size
535 bytes sent via SQL * Net to client
524 bytes received via SQL * Netfrom client
2 SQL * Net roundtrips to/fromclient
263 sorts (memory)
0 sorts (disk)
1 rows processed
After adding keywords
SQL> select count (0) from data_query t wherecontains (MDTITILE, 'xxxx million geological map ')> 0;
COUNT (0)
----------
7072
Used time: 00: 00: 00.28
Execution Plan
----------------------------------------------------------
Plan hash value: 670767155
-------------------------------------------------------------------------------
---
| Id | Operation | Name | Rows | Bytes | Cost (% CPU) | Time
|
--------------------------------------------------------------------------------
---
| 0 | select statement | 1 | 85 | 4 (0) | 00:00:0
1 |
| 1 | sort aggregate | 1 | 85 |
|
| * 2 | domain index | DATA_QUERY_IDX | 164 | 13940 | 4 (0) |
1 |
--------------------------------------------------------------------------------
---
Predicate Information (identified by operationid ):
---------------------------------------------------
2-access ("CTXSYS". "CONTAINS" ("MDTITILE", 'xxxx million geological map ')> 0)
Statistics
----------------------------------------------------------
643 recursive cballs
0 db block gets
2438 consistent gets
34 physical reads
0 redo size
535 bytes sent via SQL * Net toclient
524 bytes received via SQL * Netfrom client
2 SQL * Net roundtrips to/fromclient
34 sorts (memory)
0 sorts (disk)
1 rows processed