3.2 Filter attributes
The filter is responsible for converting data in various file formats into plain text formats. Other components in the indexing pipeline can only process plain text data and cannot recognize microsoft word or excel file formats. The filters include charset_filter,
Inso_filter, null_filter, user_filter, and procedure_filter. (The document format can be converted to the Database Text format .)
3.2.1 CHARSET_FILTER
Converts a document from a non-database character to a database character (Original: Use the CHARSET_FILTER to convert
Events from a non-database character set to the character set used by the database)
Example:
Create table hdocs (id number primary key, fmt varchar2 (10), cset varchar2 (20 ),
Text varchar2 (80)
);
Begin
Cxt_ddl.create.preference (cs_filter, CHARSET_FILTER );
Ctx_ddl.set_attribute (cs_filter, charset, UTF8 );
End
Insert into hdocs values (1, text, WE8ISO8859P1,/docs/iso.txt );
Insert into hdocs values (2, text, UTF8,/docs/utf8.txt );
Commit;
Create index hdocsx on hdocs (text) indextype is ctxsys. context
Parameters (datastore ctxsys. file_datastore
Filter cs_filter
Format column fmt
Charset column cset );
3.2.2 NULL_FILTER
Default attribute, no filtering
Oracle does not recommend the auto_filter parameter for html, xml, and plain text. We recommend that you use
Null_filter and section group type
-- Create a null filter
Create index myindex on docs (htmlfile) indextype is ctxsys. context
Parameters (filter ctxsys. null_filter section group ctxsys.html _ section_group );
The default value of Filter is affected by the index field type and datastore type.
For data in the varchar2, char, and clob fields, oracle automatically selects null_filtel. If the attribute of datastore is set
File_datastore. oracle selects auto_filter as the default value.
3.2.3 AUTO_FILTER
General filters apply to most documents, including PDF and Ms word. Filters automatically recognize plain-text, HTML, XHTML,
SGML and XML documents
Create table my_filter (id number, docs varchar2 (1000 ));
Insert into my_filter values (1, Expert Oracle Database ubunturetries );
Insert into my_filter values (2, 1.txt );
Insert into my_filter values (3, 2.doc );
Commit;
/
-- Create file datastore
Begin
Ctx_ddl.create_preference (test_filter, file_datastore );
Ctx_ddl.set_attribute (test_filter, path,/opt/tmp );
End;
-- Error information table
Select * from CTX_USER_INDEX_ERRORS
-- Create auto filter
Create index idx_m_filter on my_filter (docs) indextype is ctxsys. context
Parameters (datastore test_filter filter ctxsys. auto_filter );
Select * from my_filter where contains (docs, oracle)> 0
AUTO_FILTER can automatically identify most documents in the format. We can also specify the document type through column, including text, binary, and ignore. For documents set to binary, auto_filter is used, for a document set to text, use null_filter. For a document set to ignore, no index is performed.
Create table hdocs (id number primary key, fmt varchar2 (10), text varchar2 (80 ));
Insert into hdocs values (1, binary,/docs/myword.doc );
Insert in hdocs values (2, text,/docs/index.html );
Insert in hdocs values (2, ignore,/docs/1.txt );
Commit;
Create index hdocsx on hdocs (text) indextype is ctxsys. context
Parameters (datastore ctxsys. file_datastore filter ctxsys. auto_filter format column
Fmt );
3.2.4 MAIL_FILTER
Using mail_filter to convert RFC-822 and RFC-2045 information into indexed text
Restrictions:
The document must be us-ascii
Length cannot exceed 1024 bytes
Document must be syntactically valid with regard to RFC-822
3.2.5 USER_FILTER
Use the USER_FILTER type to specify an external filter for filtering documents in a column
3.2.6 PROCEDURE_FILTER
Use the PROCEDURE_FILTER type to filter your transactions ents with a stored procedure. The stored procedure is called
Each time a document needs to be filtered.
3.2.7 reference script
-- Create a null filter
Create index myindex on docs (htmlfile) indextype is ctxsys. context
Parameters (filter ctxsys. null_filter section group ctxsys.html _ section_group );
-- Create auto filter
Create index idx_m_filter on my_filter (docs) indextype is ctxsys. context
Parameters (datastore test_filter filter ctxsys. auto_filter );
Filter error log table: CTX_USER_INDEX_ERRORS