This article describes how to customize the dictionary of chinese_lexer.
Initialize data
Create Table Test2 (str1 varchar2 (2000), str2varchar2 (2000); insert into Test2 values '); insert into Test2 values ('image', 'fig'); Commit;
Create this method analyzer and create full-text indexes (note that the dictionary only works for chinese_lexer)
exec ctx_ddl.create_preference('my_lexer1','CHINESE_LEXER'); EXEC ctx_ddl.create_preference('dataquery','MULTI_COLUMN_DATASTORE');EXEC ctx_ddl.set_attribute('dataquery','columns', 'str1,str2'); CREATE INDEX test2_idx ON test2(str1) INDEXTYPEIS ctxsys.CONTEXT PARAMETERS('datastore dataquery LEXER my_lexer1');
Looking at the Word Table generated, we can see that there is no keyword of geological map.
[Email protected]> select * from Dr $ test2_idx $ I; token_text token_type token_first token_last token_count limit ---------------- ----------- TOKEN_INFO-----------------------------------------------------------------------------------------------------------------------------------------------------STR1 0 1 2 20090010301900102 Str2 0 1 2 20090050b01900402 geology 0 1 10090020c feedback 0 1 1008808 and 0 1 1008807 saliva 0 1 1 100880d stream 0 1 1 100880c tornado 0 1 1 100880b mountains 0 1 100880a figure 0 1 2 20090030c018805 0 2 1008802 China 0 1 1 1008806 Lushan 0 1 1 1008809 selected 13 rows.
Start using the custom dictionary
C:\Users\fengjun>ctxlc -zht -ocs zhs16GBK> zhs16gbk_102.txt C:\Users\fengjun>zhs16gbk_102.txt
I searched the document and found no keyword for geological map.
Add geological map at the end
Three files ending with D, K, and I used to generate a custom dictionary
Errors always occur here
C: \ Users \ fengjun> ctxlc-zht-ics zhs16gbk -izhs16gbk_102.txt DRG-52107: ctxkbtc internal error plus-N parameters, successfully generated C: \ Users \ fengjun> ctxlc-zht-ics zhs16gbk-n-I zhs16gbk_102.txt ,,,,,,,,,,,,,,,,,,,,,,,, ,,,,,,,,,,,,,,,,,,,,,,,,, DRG-52118: Writing index file for termsDRG-52117: Writing index file for IDsDRG-52116: done writing all termsDRG-52115: writing new terms in lexicon tofilesDRG-52114: Writing lexicon to files C: \ Users \ fengjun> the volume in the Dir Dr * drive C is the serial number of the windows8_ OS volume, which is 6c5d-2b1f c: \ Users \ fengjun directory 2,250,471 drold. dat2014/09/24 391,326 droli. dat2014/09/24 89,282 drolk. dat2014/09/24 298,206 drolt. dat 4 files, 3,029,285 bytes, 0 directories, 113,255,260,160 available bytes
Back up $ ORACLE_HOME \ CTX \ data \ zhlx
And copy the above files to $ ORACLE_HOME \ CTX \ data \ zhlx, and rename it
Copy the end of D, K, and I.
Remember to back up the original file.
[Email protected]> drop index test2_idx force; the index has been deleted. [Email protected]> Create index test2_idx ontest2 (str1) indextype is ctxsys. context parameters ('datastore dataquery lexermy_lexer1 '); the index has been created. [Email protected]> select * from Dr $ test2_idx $ I; token_text token_type token_first token_last token_count limit ---------------- ----------- TOKEN_INFO--------------------------------------------------------------------------------------------------------------------------STR1 0 1 2 20090010201900102 str2 0 1 2 20090040a01900 402 geological map 0 1 1 10090020b feedback 0 1 1008807 and 0 1 1008806 saliva 0 1 100880c stream 0 1 100880b tornado 0 1 1 100880a Mountain 0 1 1 1008809 fig 0 2 2 1008805 picture 0 2 1008802 China 0 1 1 1008805 Lushan 0 1 1 1008808 selected 13 rows.
You can see that the keyword "geological map" already exists.
In this way, the User-Defined dictionary is complete. for the retrieval of large data volumes, the User-Defined dictionary is very meaningful.
The following is a simple test.
Before adding a keyword
SQL> select count (0) from data_query t wherecontains (mdtitile, 'xxxx million geological map ')> 0;
Count (0)
----------
7072
Used time: 00: 00: 01.54
Execution Plan
----------------------------------------------------------
Plan hash value: 670767155
--------------------------------------------------------------------------------
---
| ID | operation | Name | rows | bytes | cost (% CPU) | time
|
--------------------------------------------------------------------------------
---
| 0 | SELECT statement | 1 | 85 | 4 (0) | 00:00:0
1 |
| 1 | sort aggregate | 1 | 85 |
|
| * 2 | Domain index | data_query_idx | 164 | 13940 | 4 (0) |
1 |
--------------------------------------------------------------------------------
---
Predicate information (identified by operationid ):
---------------------------------------------------
2-access ("ctxsys". "contains" ("mdtitile", 'xxxx million geological map ')> 0)
Statistics
----------------------------------------------------------
3528 recursive cballs
0 dB block gets
10214 consistent gets
1173 physical reads
2824 redo size
535 bytes sent via SQL * Net to client
524 bytes received via SQL * netfrom Client
2 SQL * Net roundtrips to/fromclient
263 sorts (memory)
0 sorts (Disk)
1 rows processed
After adding keywords
SQL> select count (0) from data_query t wherecontains (mdtitile, 'xxxx million geological map ')> 0;
Count (0)
----------
7072
Used time: 00: 00: 00.28
Execution Plan
----------------------------------------------------------
Plan hash value: 670767155
--------------------------------------------------------------------------------
---
| ID | operation | Name | rows | bytes | cost (% CPU) | time
|
--------------------------------------------------------------------------------
---
| 0 | SELECT statement | 1 | 85 | 4 (0) | 00:00:0
1 |
| 1 | sort aggregate | 1 | 85 |
|
| * 2 | Domain index | data_query_idx | 164 | 13940 | 4 (0) |
1 |
--------------------------------------------------------------------------------
---
Predicate information (identified by operationid ):
---------------------------------------------------
2-access ("ctxsys". "contains" ("mdtitile", 'xxxx million geological map ')> 0)
Statistics
----------------------------------------------------------
643 recursive cballs
0 dB block gets
2438 consistent gets
34 physical reads
0 redo size
535 bytes sent via SQL * Net toclient
524 bytes received via SQL * netfrom Client
2 SQL * Net roundtrips to/fromclient
34 sorts (memory)
0 sorts (Disk)
1 rows processed
Full-text index-custom chinese_lexer dictionary