Text mining using Uima and DB2 intelligent Miner

Source: Internet
Author: User
Tags db2

Gain more value from unstructured information. Study how a simple text mining application uses the UIMA SDK to build a text analysis engine to look for names in a document. Another UIMA component then writes the result to a table in the db2® database. This data is then used to use DB2 intelligent Miner to find strong associations between people who are often mentioned in the document.

Brief introduction

There is a growing desire to use information technology to derive greater value from unstructured information in the organization. IBM recently introduced a new unstructured information Management Architecture (UIMA) framework (see Resources), which simplifies the development and deployment of systems for analyzing unstructured media objects, such as documents, Can be used to provide functionality such as semantic search and text mining. Text mining is a data mining technique used to extract information from text. Next, a very simple text mining application is described in detail.

Overview

The text mining application described in this article is called Preston, which analyzes the document, looks for the names mentioned, and uses text mining to find people who are often referred to at the same time. Although this technique is only one of the many useful text mining techniques, it demonstrates the main features of such applications and provides a concrete example of how UIMA is used. It also demonstrates how to combine structured databases and text mining. This article is about people who want to learn how to use the new UIMA technology to connect unstructured and structured information.

Figure 1 gives an overview of the Preston. This program analyzes documents that are stored as text fields in DB2 database tables. Components in the UIMA framework read and analyze documents from the database, look for names mentioned in some format, and then write the results to another database extracted information db (EIDB). These components are developed and deployed using the tools in the UIMA SDK, and the UIMA SDK can be obtained from developerWorks (see Resources). The information in the EIDB should be analyzed and processed to prepare for text mining, which is done using DB2 intelligent Miner. The entire application can be easily run on a portable computer.

Figure 1. Overview of the Preston text mining application described in this article

The document used as an example in this article is biographical information for actors and other people from the Internet Movie Database (see Resources). IMDB For illustration purposes, I built a DB2 structured database using a subset of the IMDB content to keep these biographical information in the database as a text field.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.