Full-text index-custom chinese

Full-text index-custom chinese_lexer dictionary

Last Update:2018-06-07 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This article will explain how to customize the dictionary initialization data createtabletest2 (str1varchar2 (2000), str2varchar2 (2000); insertintotest2values (geological map, China, and feedback ); insertintotest2values (image, figure); commit;

This article explains how to customize the dictionary initialization data of the chinese_lexer analyzer. create table test2 (str1 varchar2 (2000), str2varchar2 (2000); insert into test2 values (geological map, china and ZHUSHAN tornado drool geological map); insert into test2 values (image, figure); commit;

This article describes how to customize the dictionary of chinese_lexer.

Initialize data

Create table test2 (str1 varchar2 (2000), str2varchar2 (2000); insert into test2 values '); insert into test2 values ('image', 'fig'); commit;

Create this method analyzer and create full-text indexes (note that the dictionary only works for chinese_lexer)

exec ctx_ddl.create_preference('my_lexer1','CHINESE_LEXER'); EXEC ctx_ddl.create_preference('dataquery','MULTI_COLUMN_DATASTORE');EXEC ctx_ddl.set_attribute('dataquery','columns', 'str1,str2'); CREATE INDEX test2_idx ON test2(str1) INDEXTYPEIS ctxsys.CONTEXT PARAMETERS('datastore dataquery LEXER my_lexer1');

Looking at the Word Table generated, we can see that there is no keyword of geological map.

Ctx @ STARTREK> select * from DR $ TEST2_IDX $ I; TOKEN_TEXT TOKEN_TYPE TOKEN_FIRST TOKEN_LAST TOKEN_COUNT amount ---------------- --------- TOKEN_INFO-----------------------------------------------------------------------------------------------------------------------------------------------------STR1 0 1 2 20090010301900102 STR2 0 1 2 20090050B01900402 geological 0 1 10090020C feedback 0 1 1008808 and 0 1 1008807 saliva 0 1 1 100880D stream 0 1 1 100880C tornado 0 1 1 100880B Mountain 0 1 100880A Diagram 0 1 2 20090030C018805 0 2 1008802 China 0 1 1 1008806 pianshan 0 1 1008809 already selected 13 rows.

Start using the custom dictionary

C:\Users\fengjun>ctxlc -zht -ocs zhs16GBK> zhs16gbk_102.txt C:\Users\fengjun>zhs16gbk_102.txt

I searched the document and found no keyword for geological map.

Add geological map at the end

Three files ending with d, k, and I used to generate a custom dictionary

Errors always occur here

C: \ Users \ fengjun> ctxlc-zht-ics zhs16gbk -izhs16gbk_102.txt DRG-52107: ctxkbtc internal error plus-n parameters, successfully generated C: \ Users \ fengjun> ctxlc-zht-ics zhs16gbk-n-I zhs16gbk_102.txt ,,,,,,,,,,,,,,,,,,,,,,,, ,,,,,,,,,,,,,,,,,,,,,,,,, DRG-52118: Writing index file for termsDRG-52117: Writing index file for IDsDRG-52116: Done writing all termsDRG-52115: Writing new terms in lexicon tofilesDRG-52114: Writing lexicon to files C: \ Users \ fengjun> the volume in the dir dr * drive C is the serial number of the Windows8_ OS volume, which is 6C5D-2B1F C: \ Users \ fengjun directory 2,250,471 drold. dat2014/09/24 391,326 droli. dat2014/09/24 89,282 drolk. dat2014/09/24 298,206 drolt. dat 4 files, 3,029,285 bytes, 0 directories, 113,255,260,160 available bytes

Back up $ ORACLE_HOME \ ctx \ data \ zhlx

And copy the above files to $ ORACLE_HOME \ ctx \ data \ zhlx, and rename it

Copy the end of d, k, and I.

Remember to back up the original file.

Ctx @ STARTREK> drop index test2_idx force; the index has been deleted. Ctx @ STARTREK> create index test2_idx ONtest2 (str1) indextype is ctxsys. context parameters ('datastore dataquery LEXERmy_lexer1 '); the INDEX has been created. Ctx @ STARTREK> select * from DR $ TEST2_IDX $ I; TOKEN_TEXT TOKEN_TYPE TOKEN_FIRST TOKEN_LAST TOKEN_COUNT amount ---------------- ----------- TOKEN_INFO--------------------------------------------------------------------------------------------------------------------------STR1 0 1 2 20090010201900102 STR2 0 1 2 20090040A01900402 Quality chart 0 1 1 10090020B feedback 0 1 1008807 and 0 1 1008806 saliva 0 1 1 100880C stream 0 1 1 100880B tornado 0 1 1 1 100880A mountains 0 1 1 1008809 figure 0 2 2 1008805 picture 0 2 1008802 China 0 1 1 1008805 Lushan 0 1 1 1008808 selected 13 rows.

You can see that the keyword "geological map" already exists.

In this way, the User-Defined dictionary is complete. for the retrieval of large data volumes, the User-Defined dictionary is very meaningful.

The following is a simple test.

Before adding a keyword

SQL> select count (0) from data_query t wherecontains (MDTITILE, 'xxxx million geological map ')> 0;

COUNT (0)

----------

7072

Used time: 00: 00: 01.54

Execution Plan

----------------------------------------------------------

Plan hash value: 670767155

--------------------------------------------------------------------------------

---

--------------------------------------------------------------------------------

---

| 0 | select statement | 1 | 85 | 4 (0) | 00:00:0

1 |

| 1 | sort aggregate | 1 | 85 |

| * 2 | domain index | DATA_QUERY_IDX | 164 | 13940 | 4 (0) |

1 |

--------------------------------------------------------------------------------

---

Predicate Information (identified by operationid ):

--------------------------------------------------

2-access ("CTXSYS". "CONTAINS" ("MDTITILE", 'xxxx million geological map ')> 0)

Statistics

----------------------------------------------------------

3528 recursive cballs

0 db block gets

10214 consistent gets

1173 physical reads

2824 redo size

535 bytes sent via SQL * Net to client

524 bytes received via SQL * Netfrom client

2 SQL * Net roundtrips to/fromclient

263 sorts (memory)

0 sorts (disk)

1 rows processed

After adding keywords

SQL> select count (0) from data_query t wherecontains (MDTITILE, 'xxxx million geological map ')> 0;

COUNT (0)

----------

7072

Used time: 00: 00: 00.28

Execution Plan

----------------------------------------------------------

Plan hash value: 670767155

-------------------------------------------------------------------------------

---

--------------------------------------------------------------------------------

---

| 0 | select statement | 1 | 85 | 4 (0) | 00:00:0

1 |

| 1 | sort aggregate | 1 | 85 |

| * 2 | domain index | DATA_QUERY_IDX | 164 | 13940 | 4 (0) |

1 |

--------------------------------------------------------------------------------

---

Predicate Information (identified by operationid ):

---------------------------------------------------

2-access ("CTXSYS". "CONTAINS" ("MDTITILE", 'xxxx million geological map ')> 0)

Statistics

----------------------------------------------------------

643 recursive cballs

0 db block gets

2438 consistent gets

34 physical reads

0 redo size

535 bytes sent via SQL * Net toclient

524 bytes received via SQL * Netfrom client

2 SQL * Net roundtrips to/fromclient

34 sorts (memory)

0 sorts (disk)

1 rows processed

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Full-text index-custom chinese_lexer dictionary

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Full-text index-custom chinese_lexer dictionary

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support