Oracle full-text search Chinese

Source: Internet
Author: User
Tags lexer

Oracle supports full-text retrieval from 7.3, that is, you can use the ConText option of the Oracle server to complete text-based queries. Specifically, you can use wildcard search, fuzzy match, related classification, approximate search, conditional weighting, and word intention extension methods. In Oracle8.0.x, it is called ConText; in Oracle8i, it is called interMedia Text; after Oracle9i, it is called Oracle Text. The following is an example of oracle full-text search.

Assign User Permissions

Grant connect, resource to scott;
Grant ctxapp to scott;
Alter user portal default role all;
-- Object permission
Grant execute on ctx_ddl to scott;

Unlock ctxsys user
Alter user ctxsys account unlock identified by ctxsys;

Log on with an application user and set the searcher type
SQL> conn scott/tiger;

BEGIN
Ctx_ddl.create_preference ('main _ lexer ', 'Chinese _ LEXER ');
Ctx_ddl.create_preference ('mywordlist', 'Basic _ wordlist ');
Ctx_ddl.set_attribute ('mywordlist', 'prefix _ Index', 'true ');
Ctx_ddl.set_attribute ('mywordlist', 'prefix _ MIN_LENGTH ', 1 );
Ctx_ddl.set_attribute ('mywordlist', 'prefix _ MAX_LENGTH ', 5 );
Ctx_ddl.set_attribute ('mywordlist', 'substring _ Index', 'yes ');
END;
/
Here we will talk about the lexical analyzer (lexer). There are three kinds of lexical analyzer:
1) basic_lexer: for English. It can separate English words from sentences based on Spaces and punctuations, and automatically treat words that have lost retrieval meaning frequently as 'spam ', such as if, is and so on, with high processing efficiency. However, only space and punctuation are recognized, while there is usually no space in a Chinese sentence, and the entire sentence is regarded as a group. In fact, the retrieval capability is lost.
2) chinese_vgram_lexer: A specialized Chinese analyzer that supports all Chinese character sets (ZHS16CGB231280 ZHS16GBK ZHT32EUC ZHT16BIG5 ZHT32TRIS ZHT16MSWIN950 ZHT16HKSCS UTF8 ). However, the word segmentation method is too simple. Each combination is used as a word, which occupies space and is inefficient.
3) chinese_lexer: this is a new Chinese analyzer that only supports the utf8 character set. The biggest improvement of chinese_lexer is that it can recognize most of the commonly used Chinese words and therefore analyze sentences more efficiently.

Create sample data
Create table docs (id number, name varchar2 (200), address varchar2 (2000 ));
Insert into docs values (1, 'John Smith ', 'room 403, No. 37, ShiFan Residential Quarter, BaoShan District ');
Insert into docs values (2, 'noah Abelard ', 'room 201, No. 34, Lane 125, XiKang Road (South), HongKou District ');
Insert into docs values (3, 'Michael core', 'room 42, Zhongzhou Road, Nanyang City, Henan Prov .');
Insert into docs values (4, 'Thomas Matthew ', 'hongyuan Hotel, Jingzhou city, Hubei Prov .');
Insert into docs values (5, 'Joseph ', 'Special Steel Corp, No. 272, Bayi Road, Nanyang City, Henan Prov .');
Insert into docs values (6, 'lauren ', 'room 702, 7th Building, Hengda Garden, East District, Zhongshan ');
Insert into docs values (7, 'kevin Victoria ', 'room 601, No. 34 Long Chang Li, Xiamen, fujiian ');
Insert into docs values (8, 'Michael ', 'cheng Nuo Ban, Gong Jiao Zong Gong Si, Xiamen, fujiian ');
Insert into docs values (9, 'timothy Catherine ', 'No. 204, Entrance A, Building NO. 1, The 2nd Dormitory of the NO. 4 State-owned Textile Factory, 53 Kaiping Road, Qingdao, Shandong ');
Insert into docs values (10, 'zhou Wangcai ', 'room 601, No. 34 Long Chang Li, Xiamen, fujiian, China 361012 ');
Insert into docs values (11, 'sebastian jared', 'cheng Nuo Ban, Gong Jiao Zong Gong SiXiamen, fujiian, China 100 ');
Insert into docs values (12, 'jenna', 'No. 204, A, Building NO. 1, The 2nd Dormitory of the NO. 4 State-owned Textile Factory, 53 Kaiping Road, Qingdao, Shandong, China 266042 ');
Insert into docs values (13, 'Catherine ', 'room403, No. 37, SiFanResidentialQuarter, BaoShanDistrict ');
Insert into docs values (14, 'sebastian Cole ', '1 Team CaiQi ChuanXiBei Mining Area JiangYou City SiChuan Province China ');
Insert into docs values (15, 'timothy jared', 'room 201, No. 34, Lane 125, XiKang Road (South), HongKou District ');
Insert into docs values (16, 'zhang san', '8b, building 1, North Xiaoying Asian Games garden, Chaoyang district, Beijing, China ');
Insert into docs values (17, 'Lee 4', '9th floor, 1500 Century Avenue, Shanghai ');
Insert into docs values (18, 'wang 5', 'no. 515 Century Avenue, Chengdu Economic and Technological Development Zone, Sichuan ');
/

Create a document CONTEXT index for the text column and specify the searcher/filter/word list/
Create index idx_docs_address ON docs (address)
Indextype is ctxsys. context parameters
('Datastore CTXSYS. DIRECT_DATASTORE filter ctxsys. INSO_FILTER LEXER main_lexer WORDLIST mywordlist ');

  • 1
  • 2
  • Next Page

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.