Full-text index-custom chinese_lexer dictionary

Source: Internet
Author: User
This article will explain how to customize the dictionary initialization data createtabletest2 (str1varchar2 (2000), str2varchar2 (2000); insertintotest2values (geological map, China, and feedback ); insertintotest2values (image, figure); commit;

This article explains how to customize the dictionary initialization data of the chinese_lexer analyzer. create table test2 (str1 varchar2 (2000), str2varchar2 (2000); insert into test2 values (geological map, china and ZHUSHAN tornado drool geological map); insert into test2 values (image, figure); commit;

This article describes how to customize the dictionary of chinese_lexer.

Initialize data

Create table test2 (str1 varchar2 (2000), str2varchar2 (2000); insert into test2 values '); insert into test2 values ('image', 'fig'); commit;

Create this method analyzer and create full-text indexes (note that the dictionary only works for chinese_lexer)

exec ctx_ddl.create_preference('my_lexer1','CHINESE_LEXER'); EXEC ctx_ddl.create_preference('dataquery','MULTI_COLUMN_DATASTORE');EXEC ctx_ddl.set_attribute('dataquery','columns', 'str1,str2'); CREATE INDEX test2_idx ON test2(str1) INDEXTYPEIS ctxsys.CONTEXT PARAMETERS('datastore dataquery LEXER my_lexer1');

Looking at the Word Table generated, we can see that there is no keyword of geological map.

Ctx @ STARTREK> select * from DR $ TEST2_IDX $ I; TOKEN_TEXT TOKEN_TYPE TOKEN_FIRST TOKEN_LAST TOKEN_COUNT amount ---------------- --------- TOKEN_INFO-----------------------------------------------------------------------------------------------------------------------------------------------------STR1 0 1 2 20090010301900102 STR2 0 1 2 20090050B01900402 geological 0 1 10090020C feedback 0 1 1008808 and 0 1 1008807 saliva 0 1 1 100880D stream 0 1 1 100880C tornado 0 1 1 100880B Mountain 0 1 100880A Diagram 0 1 2 20090030C018805 0 2 1008802 China 0 1 1 1008806 pianshan 0 1 1008809 already selected 13 rows.

Start using the custom dictionary

C:\Users\fengjun>ctxlc -zht -ocs zhs16GBK> zhs16gbk_102.txt C:\Users\fengjun>zhs16gbk_102.txt

I searched the document and found no keyword for geological map.

Add geological map at the end

Three files ending with d, k, and I used to generate a custom dictionary

Errors always occur here

C: \ Users \ fengjun> ctxlc-zht-ics zhs16gbk -izhs16gbk_102.txt DRG-52107: ctxkbtc internal error plus-n parameters, successfully generated C: \ Users \ fengjun> ctxlc-zht-ics zhs16gbk-n-I zhs16gbk_102.txt ,,,,,,,,,,,,,,,,,,,,,,,, ,,,,,,,,,,,,,,,,,,,,,,,,, DRG-52118: Writing index file for termsDRG-52117: Writing index file for IDsDRG-52116: Done writing all termsDRG-52115: Writing new terms in lexicon tofilesDRG-52114: Writing lexicon to files C: \ Users \ fengjun> the volume in the dir dr * drive C is the serial number of the Windows8_ OS volume, which is 6C5D-2B1F C: \ Users \ fengjun directory 2,250,471 drold. dat2014/09/24 391,326 droli. dat2014/09/24 89,282 drolk. dat2014/09/24 298,206 drolt. dat 4 files, 3,029,285 bytes, 0 directories, 113,255,260,160 available bytes

Back up $ ORACLE_HOME \ ctx \ data \ zhlx

And copy the above files to $ ORACLE_HOME \ ctx \ data \ zhlx, and rename it

Copy the end of d, k, and I.

Remember to back up the original file.

Ctx @ STARTREK> drop index test2_idx force; the index has been deleted. Ctx @ STARTREK> create index test2_idx ONtest2 (str1) indextype is ctxsys. context parameters ('datastore dataquery LEXERmy_lexer1 '); the INDEX has been created. Ctx @ STARTREK> select * from DR $ TEST2_IDX $ I; TOKEN_TEXT TOKEN_TYPE TOKEN_FIRST TOKEN_LAST TOKEN_COUNT amount ---------------- ----------- TOKEN_INFO--------------------------------------------------------------------------------------------------------------------------STR1 0 1 2 20090010201900102 STR2 0 1 2 20090040A01900402 Quality chart 0 1 1 10090020B feedback 0 1 1008807 and 0 1 1008806 saliva 0 1 1 100880C stream 0 1 1 100880B tornado 0 1 1 1 100880A mountains 0 1 1 1008809 figure 0 2 2 1008805 picture 0 2 1008802 China 0 1 1 1008805 Lushan 0 1 1 1008808 selected 13 rows.

You can see that the keyword "geological map" already exists.

In this way, the User-Defined dictionary is complete. for the retrieval of large data volumes, the User-Defined dictionary is very meaningful.

The following is a simple test.

Before adding a keyword

SQL> select count (0) from data_query t wherecontains (MDTITILE, 'xxxx million geological map ')> 0;

COUNT (0)

----------

7072

Used time: 00: 00: 01.54

Execution Plan

----------------------------------------------------------

Plan hash value: 670767155

--------------------------------------------------------------------------------

---

| Id | Operation | Name | Rows | Bytes | Cost (% CPU) | Time

|

--------------------------------------------------------------------------------

---

| 0 | select statement | 1 | 85 | 4 (0) | 00:00:0

1 |

| 1 | sort aggregate | 1 | 85 |

|

| * 2 | domain index | DATA_QUERY_IDX | 164 | 13940 | 4 (0) |

1 |

--------------------------------------------------------------------------------

---

Predicate Information (identified by operationid ):

--------------------------------------------------

2-access ("CTXSYS". "CONTAINS" ("MDTITILE", 'xxxx million geological map ')> 0)

Statistics

----------------------------------------------------------

3528 recursive cballs

0 db block gets

10214 consistent gets

1173 physical reads

2824 redo size

535 bytes sent via SQL * Net to client

524 bytes received via SQL * Netfrom client

2 SQL * Net roundtrips to/fromclient

263 sorts (memory)

0 sorts (disk)

1 rows processed

After adding keywords

SQL> select count (0) from data_query t wherecontains (MDTITILE, 'xxxx million geological map ')> 0;

COUNT (0)

----------

7072

Used time: 00: 00: 00.28

Execution Plan

----------------------------------------------------------

Plan hash value: 670767155

-------------------------------------------------------------------------------

---

| Id | Operation | Name | Rows | Bytes | Cost (% CPU) | Time

|

--------------------------------------------------------------------------------

---

| 0 | select statement | 1 | 85 | 4 (0) | 00:00:0

1 |

| 1 | sort aggregate | 1 | 85 |

|

| * 2 | domain index | DATA_QUERY_IDX | 164 | 13940 | 4 (0) |

1 |

--------------------------------------------------------------------------------

---

Predicate Information (identified by operationid ):

---------------------------------------------------

2-access ("CTXSYS". "CONTAINS" ("MDTITILE", 'xxxx million geological map ')> 0)

Statistics

----------------------------------------------------------

643 recursive cballs

0 db block gets

2438 consistent gets

34 physical reads

0 redo size

535 bytes sent via SQL * Net toclient

524 bytes received via SQL * Netfrom client

2 SQL * Net roundtrips to/fromclient

34 sorts (memory)

0 sorts (disk)

1 rows processed

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.