Topic Center

Contact Sales

Home > Others

Text mining using Uima and DB2 intelligent Miner

Last Update:2017-02-27 Source: Internet

Author: User

Tags db2

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Gain more value from unstructured information. Study how a simple text mining application uses the UIMA SDK to build a text analysis engine to look for names in a document. Another UIMA component then writes the result to a table in the db2® database. This data is then used to use DB2 intelligent Miner to find strong associations between people who are often mentioned in the document.

Brief introduction

There is a growing desire to use information technology to derive greater value from unstructured information in the organization. IBM recently introduced a new unstructured information Management Architecture (UIMA) framework (see Resources), which simplifies the development and deployment of systems for analyzing unstructured media objects, such as documents, Can be used to provide functionality such as semantic search and text mining. Text mining is a data mining technique used to extract information from text. Next, a very simple text mining application is described in detail.

Overview

The text mining application described in this article is called Preston, which analyzes the document, looks for the names mentioned, and uses text mining to find people who are often referred to at the same time. Although this technique is only one of the many useful text mining techniques, it demonstrates the main features of such applications and provides a concrete example of how UIMA is used. It also demonstrates how to combine structured databases and text mining. This article is about people who want to learn how to use the new UIMA technology to connect unstructured and structured information.

Figure 1 gives an overview of the Preston. This program analyzes documents that are stored as text fields in DB2 database tables. Components in the UIMA framework read and analyze documents from the database, look for names mentioned in some format, and then write the results to another database extracted information db (EIDB). These components are developed and deployed using the tools in the UIMA SDK, and the UIMA SDK can be obtained from developerWorks (see Resources). The information in the EIDB should be analyzed and processed to prepare for text mining, which is done using DB2 intelligent Miner. The entire application can be easily run on a portable computer.

Figure 1. Overview of the Preston text mining application described in this article

The document used as an example in this article is biographical information for actors and other people from the Internet Movie Database (see Resources). IMDB For illustration purposes, I built a DB2 structured database using a subset of the IMDB content to keep these biographical information in the database as a text field.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

text mining book rapidminer text mining practical text mining with perl db2 text data type using and intelligent telematics intelligent patching

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Text mining using Uima and DB2 intelligent Miner

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support