Building a Full-text Search application with Oracle text

Source: Internet
Author: User
Tags file system php and oracle database

Oracle Text is a powerful search technology that is built into all versions of the Oracle database, including free available version (XE). It provides a development API that makes it easy for software developers to implement full-featured content search applications.

Oracle Text can be used to search for structured and unstructured documents and is a complement to the matching of SQL wildcard characters. Oracle Text supports the use of basic Boolean operators (and, or, not, NEAR, and so on) to combine multiple search entries together, and it also has more advanced functionality, such as Soundex and blur search, and result ordering. This technique supports hundreds of file types, including Microsoft Office and PDFs. Oracle Text is suitable for a variety of search-related usage and storage structures. Text application areas include E-commerce, documentation and records management, and problem tracking. Searchable text can reside in a database in a structured form, or it can reside in a local file system or on the Web in an unstructured form.

Oracle Text provides a complete sql-based search API that contains custom query operators, DDL syntax extensions, a set of Pl/sql procedures, and database views. With the Text API, application developers have complete control over indexing, querying, security, presentation, and sometimes the software configuration that is required, especially when developing out-of-the-way customizations. With a software product that needs to be used, you want to make the configuration of the software as simple as possible, even if it means doing more upfront work in product development. Reducing the complexity of your application typically results later in the product lifecycle, especially in support, maintenance, and future product development phases.

Oracle Text also supports document-level authorization, and document-level authorization often makes it difficult to maintain high performance while unifying. With Text, the combination of relational data and unstructured data is also well supported. For authorization, this means that you can combine Full-text search and authorization into a single query. An independent result set and the number of filtering phases required to obtain the final result can be reduced to the greatest extent, simplifying application development. Oracle Text frees application developers from tedious development and allows them to focus on performance optimization.

Oracle Text is also unknown to the programming language and can also perform as well as excel in PHP and Java applications.

Some time ago, I needed to improve the search capabilities of the Enterprise Content Management (ECM) system. I first evaluated the use of Oracle Text. Evaluation Proof Oracle Text is a very viable technique for building application searches: It has advanced search capabilities, supports a large number of different file types, is highly customizable, and is highly scalable. A disadvantage of the original search technology is that you need to run the file content search outside of the database, then run the database metadata search, authorize the results, and finally merge the independent result sets. With Oracle Text, all of these operations can be done in the database. ECM systems have used Oracle databases to store meta data. Because this technology has been introduced, customers will naturally choose to use it, and it will not add any cost to the customer.

A simple way to perform a free text search query in a database is similar to the following:

SELECT * FROM issues
WHERE LOWER (author) like '%word1% ' and LOWER (author) like '%word2% ' ...

With this approach, each column needs to be matched individually with each keyword. In each column, you can match the keyword in any order. However, a relational database is designed so that it does not execute queries as efficiently as above, and using this method produces extremely scalable applications. Of course, you can design your own indexing and search solutions, but that way you might not optimize your resources, especially if you've already paid for the search technology as part of your database.

This article discusses the use of Oracle Text in a virtual problem-tracking application. In this application, users can create problems that include metadata and optional attached files. The application uses Oracle text to implement Full-text search capabilities for metadata and optional attached file content.

The examples presented here have been tested on the Linux Oracle database XE, and these examples should also work well on other Oracle platforms.

Indexing process and search

After Oracle Text is indexed for a searchable data item, the user is able to find content by searching. Indexing is a common way to ensure search performance. The Oracle Text indexing process is modeled on pipelines in which keywords are added to the index after a series of transformations from the data item retrieved from the data store. The indexing process is divided into phases, each of which is handled by a separate entity and can be configured by the application developer.

Oracle Text has different index types for different purposes. For full-text searches of large documents, it is appropriate to use the context index type. The indexing process includes the following phases:

    • Data retrieval: Simply remove data from a data store (such as a Web page, a large database object, or a local file system) and send it to the next stage as data.
    • Filter: Filters are responsible for converting data in various file formats into plain text format. Other components in the index pipeline can only handle plain text data and do not recognize file formats such as Microsoft Word or Excel.
    • Fragment: The fragment device adds metadata about the original data item structure.
    • Lexical analysis: The character stream is divided into several words according to the language of the data item.
    • Index: The last stage adds a keyword to the actual index.

After the index build is complete, the application can perform the search for end user input through ordinary SQL queries.

Install Oracle Text

By default, Oracle Text is installed with Oracle database XE. How to use a different database version, you need to install the Oracle Text feature yourself. With this feature installed, you only need to create an ordinary database user and give the user the Ctxapp role. This allows the user to perform a specific index management process:

CREATE USER ot1 identified by OT1;
GRANT Connect,resource, Ctxapp to OT1;

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.