Research on full-text retrieval of Oracle (all 2)

Source: Internet
Author: User

3.2 Filter attributes

The filter is responsible for converting data in various file formats into plain text formats. Other components in the indexing pipeline can only process plain text data and cannot recognize microsoft word or excel file formats. The filters include charset_filter,

Inso_filter, null_filter, user_filter, and procedure_filter. (The document format can be converted to the Database Text format .)

 

 

 

 

 

3.2.1 CHARSET_FILTER

Converts a document from a non-database character to a database character (Original: Use the CHARSET_FILTER to convert

Events from a non-database character set to the character set used by the database)

 

Example:

Create table hdocs (id number primary key, fmt varchar2 (10), cset varchar2 (20 ),

Text varchar2 (80)

);

Begin

Cxt_ddl.create.preference (cs_filter, CHARSET_FILTER );

Ctx_ddl.set_attribute (cs_filter, charset, UTF8 );

End

Insert into hdocs values (1, text, WE8ISO8859P1,/docs/iso.txt );

Insert into hdocs values (2, text, UTF8,/docs/utf8.txt );

Commit;

Create index hdocsx on hdocs (text) indextype is ctxsys. context

Parameters (datastore ctxsys. file_datastore

Filter cs_filter

Format column fmt

Charset column cset );

 

 

 

3.2.2 NULL_FILTER

Default attribute, no filtering

Oracle does not recommend the auto_filter parameter for html, xml, and plain text. We recommend that you use

Null_filter and section group type

-- Create a null filter

Create index myindex on docs (htmlfile) indextype is ctxsys. context

Parameters (filter ctxsys. null_filter section group ctxsys.html _ section_group );

The default value of Filter is affected by the index field type and datastore type.

For data in the varchar2, char, and clob fields, oracle automatically selects null_filtel. If the attribute of datastore is set

File_datastore. oracle selects auto_filter as the default value.

 

3.2.3 AUTO_FILTER

General filters apply to most documents, including PDF and Ms word. Filters automatically recognize plain-text, HTML, XHTML,

SGML and XML documents

Create table my_filter (id number, docs varchar2 (1000 ));

Insert into my_filter values (1, Expert Oracle Database ubunturetries );

Insert into my_filter values (2, 1.txt );

Insert into my_filter values (3, 2.doc );

Commit;

/

-- Create file datastore

Begin

Ctx_ddl.create_preference (test_filter, file_datastore );

Ctx_ddl.set_attribute (test_filter, path,/opt/tmp );

End;

-- Error information table

Select * from CTX_USER_INDEX_ERRORS

-- Create auto filter

Create index idx_m_filter on my_filter (docs) indextype is ctxsys. context

Parameters (datastore test_filter filter ctxsys. auto_filter );

Select * from my_filter where contains (docs, oracle)> 0

 

AUTO_FILTER can automatically identify most documents in the format. We can also specify the document type through column, including text, binary, and ignore. For documents set to binary, auto_filter is used, for a document set to text, use null_filter. For a document set to ignore, no index is performed.

Create table hdocs (id number primary key, fmt varchar2 (10), text varchar2 (80 ));

Insert into hdocs values (1, binary,/docs/myword.doc );

Insert in hdocs values (2, text,/docs/index.html );

Insert in hdocs values (2, ignore,/docs/1.txt );

Commit;

Create index hdocsx on hdocs (text) indextype is ctxsys. context

Parameters (datastore ctxsys. file_datastore filter ctxsys. auto_filter format column

Fmt );

 

3.2.4 MAIL_FILTER

Using mail_filter to convert RFC-822 and RFC-2045 information into indexed text

Restrictions:

The document must be us-ascii

Length cannot exceed 1024 bytes

Document must be syntactically valid with regard to RFC-822

 

 

3.2.5 USER_FILTER

Use the USER_FILTER type to specify an external filter for filtering documents in a column

 

3.2.6 PROCEDURE_FILTER

Use the PROCEDURE_FILTER type to filter your transactions ents with a stored procedure. The stored procedure is called

Each time a document needs to be filtered.

 

3.2.7 reference script

-- Create a null filter

Create index myindex on docs (htmlfile) indextype is ctxsys. context

Parameters (filter ctxsys. null_filter section group ctxsys.html _ section_group );

-- Create auto filter

Create index idx_m_filter on my_filter (docs) indextype is ctxsys. context

Parameters (datastore test_filter filter ctxsys. auto_filter );

 

Filter error log table: CTX_USER_INDEX_ERRORS

 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.