Full-text index-custom chinese_lexer dictionary

Source: Internet
Author: User
Tags sorts

This article describes how to customize the dictionary of chinese_lexer.

 

Initialize data

 

Create Table Test2 (str1 varchar2 (2000), str2varchar2 (2000); insert into Test2 values '); insert into Test2 values ('image', 'fig'); Commit;


 

Create this method analyzer and create full-text indexes (note that the dictionary only works for chinese_lexer)

 exec ctx_ddl.create_preference('my_lexer1','CHINESE_LEXER');   EXEC ctx_ddl.create_preference('dataquery','MULTI_COLUMN_DATASTORE');EXEC ctx_ddl.set_attribute('dataquery','columns', 'str1,str2');  CREATE INDEX test2_idx ON test2(str1) INDEXTYPEIS ctxsys.CONTEXT PARAMETERS('datastore dataquery LEXER my_lexer1'); 


 

Looking at the Word Table generated, we can see that there is no keyword of geological map.

 

[Email protected]> select * from Dr $ test2_idx $ I; token_text token_type token_first token_last token_count limit ---------------- ----------- TOKEN_INFO-----------------------------------------------------------------------------------------------------------------------------------------------------STR1 0 1 2 20090010301900102 Str2 0 1 2 20090050b01900402 geology 0 1 10090020c feedback 0 1 1008808 and 0 1 1008807 saliva 0 1 1 100880d stream 0 1 1 100880c tornado 0 1 1 100880b mountains 0 1 100880a figure 0 1 2 20090030c018805 0 2 1008802 China 0 1 1 1008806 Lushan 0 1 1 1008809 selected 13 rows.


 

 

 

 

 

 

Start using the custom dictionary

 

 

 

 

C:\Users\fengjun>ctxlc -zht -ocs zhs16GBK> zhs16gbk_102.txt C:\Users\fengjun>zhs16gbk_102.txt


 

I searched the document and found no keyword for geological map.

 


 

Add geological map at the end

 

Three files ending with D, K, and I used to generate a custom dictionary

 

Errors always occur here

C: \ Users \ fengjun> ctxlc-zht-ics zhs16gbk -izhs16gbk_102.txt DRG-52107: ctxkbtc internal error plus-N parameters, successfully generated C: \ Users \ fengjun> ctxlc-zht-ics zhs16gbk-n-I zhs16gbk_102.txt ,,,,,,,,,,,,,,,,,,,,,,,, ,,,,,,,,,,,,,,,,,,,,,,,,, DRG-52118: Writing index file for termsDRG-52117: Writing index file for IDsDRG-52116: done writing all termsDRG-52115: writing new terms in lexicon tofilesDRG-52114: Writing lexicon to files C: \ Users \ fengjun> the volume in the Dir Dr * drive C is the serial number of the windows8_ OS volume, which is 6c5d-2b1f c: \ Users \ fengjun directory 2,250,471 drold. dat2014/09/24 391,326 droli. dat2014/09/24 89,282 drolk. dat2014/09/24 298,206 drolt. dat 4 files, 3,029,285 bytes, 0 directories, 113,255,260,160 available bytes


 

Back up $ ORACLE_HOME \ CTX \ data \ zhlx

And copy the above files to $ ORACLE_HOME \ CTX \ data \ zhlx, and rename it

 

Copy the end of D, K, and I.

 

Remember to back up the original file.

 

 

 

[Email protected]> drop index test2_idx force; the index has been deleted. [Email protected]> Create index test2_idx ontest2 (str1) indextype is ctxsys. context parameters ('datastore dataquery lexermy_lexer1 '); the index has been created. [Email protected]> select * from Dr $ test2_idx $ I; token_text token_type token_first token_last token_count limit ---------------- ----------- TOKEN_INFO--------------------------------------------------------------------------------------------------------------------------STR1 0 1 2 20090010201900102 str2 0 1 2 20090040a01900 402 geological map 0 1 1 10090020b feedback 0 1 1008807 and 0 1 1008806 saliva 0 1 100880c stream 0 1 100880b tornado 0 1 1 100880a Mountain 0 1 1 1008809 fig 0 2 2 1008805 picture 0 2 1008802 China 0 1 1 1008805 Lushan 0 1 1 1008808 selected 13 rows.


 

You can see that the keyword "geological map" already exists.

 

 

In this way, the User-Defined dictionary is complete. for the retrieval of large data volumes, the User-Defined dictionary is very meaningful.

 

 

 

 

The following is a simple test.

 

Before adding a keyword

SQL> select count (0) from data_query t wherecontains (mdtitile, 'xxxx million geological map ')> 0;

 

Count (0)

----------

7072

Used time: 00: 00: 01.54

 

 

Execution Plan

----------------------------------------------------------

Plan hash value: 670767155

 

--------------------------------------------------------------------------------

---

 

| ID | operation | Name | rows | bytes | cost (% CPU) | time

|

 

--------------------------------------------------------------------------------

---

 

| 0 | SELECT statement | 1 | 85 | 4 (0) | 00:00:0

1 |

 

| 1 | sort aggregate | 1 | 85 |

|

 

| * 2 | Domain index | data_query_idx | 164 | 13940 | 4 (0) |

1 |

 

--------------------------------------------------------------------------------

---

 

 

Predicate information (identified by operationid ):

---------------------------------------------------

 

2-access ("ctxsys". "contains" ("mdtitile", 'xxxx million geological map ')> 0)

 

 

Statistics

----------------------------------------------------------

3528 recursive cballs

0 dB block gets

10214 consistent gets

1173 physical reads

2824 redo size

535 bytes sent via SQL * Net to client

524 bytes received via SQL * netfrom Client

2 SQL * Net roundtrips to/fromclient

263 sorts (memory)

0 sorts (Disk)

1 rows processed

 

 

 

After adding keywords

 

SQL> select count (0) from data_query t wherecontains (mdtitile, 'xxxx million geological map ')> 0;

 

Count (0)

----------

7072

 

Used time: 00: 00: 00.28

 

Execution Plan

----------------------------------------------------------

Plan hash value: 670767155

 

--------------------------------------------------------------------------------

---

 

| ID | operation | Name | rows | bytes | cost (% CPU) | time

|

 

--------------------------------------------------------------------------------

---

 

| 0 | SELECT statement | 1 | 85 | 4 (0) | 00:00:0

1 |

 

| 1 | sort aggregate | 1 | 85 |

|

 

| * 2 | Domain index | data_query_idx | 164 | 13940 | 4 (0) |

1 |

 

--------------------------------------------------------------------------------

---

 

 

Predicate information (identified by operationid ):

---------------------------------------------------

 

2-access ("ctxsys". "contains" ("mdtitile", 'xxxx million geological map ')> 0)

 

 

Statistics

----------------------------------------------------------

643 recursive cballs

0 dB block gets

2438 consistent gets

34 physical reads

0 redo size

535 bytes sent via SQL * Net toclient

524 bytes received via SQL * netfrom Client

2 SQL * Net roundtrips to/fromclient

34 sorts (memory)

0 sorts (Disk)

1 rows processed

 

 

 

 

Full-text index-custom chinese_lexer dictionary

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.