IBM Accelerator for Machine Data Analytics (ii) speed up the analysis of new log types

Source: Internet
Author: User
Tags mail prepare

Before you start

One of the main advantages and strengths of IBM Accelerator for Machine Data Analytics is the ability to easily configure and customize tools. This series of articles and tutorials is intended for readers who want to get a sense of the accelerator, further speed up machine data analysis, and want to gain customized insights.

This tutorial is a concrete example of using IBM Accelerator for Machine data Analytics to analyze an entirely new type. It establishes the basis for part 3rd, and part 3rd describes how to use this new type of log Plug and Play in indexing and searching.

Goal

In this tutorial, you will learn how to do the following tasks.

Start analyzing a new dataset using the Out-of-the-box support in the accelerator.

Identify the missing fields that are required for profiling.

Custom Accelerator to create your own log type for subsequent analysis.

Prerequisite

You should be familiar with Biginsights Text Analytics and AQL (Annotation Query Language). It's better to be familiar with Biginsightstext Analytics tools, but that's not necessary.

System Requirements

In order to run the examples in this tutorial, you need to meet the following criteria.

Biginsights v2.0 has been installed.

IBM Accelerator for Machine Data Analytics has been installed.

The Biginsights v2.0 Eclipse tool is installed.

Provides a dataset for machine data analysis.

The diversity in Machine data analysis

In part 1th of this series: Speeding up machine data analysis, you learned how to use machine data from some known types, such as Apache Web Access and WebSphere, and also learned how to use a generic to work with some types that the accelerator doesn't know much about.

As long as the data is based on time series of text data, no need to write any new code can use any machine data technology for analysis!

Using generics, you will be able to extract most of the fields that are common in machine data. Many times, many data contain name-value pairs, XML-leaf-tag values, and generics extract the information that interests you most.

With these techniques, the accelerator provides a way to customize existing rules or add new ones if there are fields that are specific to a data type that have not been extracted.

In this tutorial, you will use e-mail data and learn how to add new log types to analyze this data, including the following.

How to use the Eclipse tool to customize an existing rule or create a new rule.

How to publish custom rules for production-tailored applications.

The case of a fictitious Sample outdoors company

In the 1th part of the series: Speeding up Machine data analysis, Sample Outdoors's data scientists were able to use the logs from their entire application stack to be informed of the problems reported in Saturday of July 14. They are also able to understand the underlying causes of the problem.

Many customers were affected on July 14, and the customer Support Center was flooded with e-mails from customers complaining about them. Sample Outdoors Company is facing the risk of negative publicity, but also worried about the loss of existing customers and potential customers. One way to solve this problem is to provide coupons that allow these customers to save money when they buy products in the future. July 14 Saturday is one of the busiest days for Sample Outdoors's largest promotional period, with many customers affected. Sample Outdoors wants these coupons to be given priority to specific customers who contact the Support Center via e-mail.

To do this, the Sample outdoors company needs to obtain a consolidated view that contains all the attempted customer orders, as well as affected customers. Sample Outdoors Company has mastered the analysis of the attempt to order information. They now want to increase their customers ' emails so that they can get enough information about the size and details of their customers and their orders, thus providing them with the appropriate discounts on coupons.

10 features to accelerate machine data analysis for new log types

Read the features overview and highlights of the IBM Accelerator for Machine data Analytics below, which you can use to analyze your own data types.

Learn how to prepare e-mail data for analysis, see Preparing a data section for a new log type.

Use generics, and learn how to validate results and identify any missing fields, refer to the Out-of-the-box Support Section.

Set up Eclipse environment, handle extraction application customization, please refer to control! Prepare the customization section.

Get a quick understanding of Extract applications in the Extraction Application section.

A quick overview of the Eclipse tools for text analysis at the tools overview section.

Use the new rules to extract e-mail-specific fields and test them, see Creating your own part of the e-mail log type.

The text analysis rules for e-mail data are viewed in the Understanding Code section.

In the Insider section, learn about naming conventions that enable this new log type to Plug and play in other applications.

Publish custom applications to the Biginsights cluster, see Publishing Custom Application sections.

Using a custom extraction application to extract e-mail messages and see the results, see the new log type in practice! Part.

Use email at Sample Outdoors Company

Sample Outdoors Company's data scientists want to take advantage of the email that the Customer Support Center has received. They would like to receive an e-mail from a customer who complained during the outage of Saturday July 14. They then use this information to get the order size information, customer loyalty data, and send the appropriate coupons to these customers via email.

They collect emails from customersupport@sampleoutdoors.com and websupport@sampleoutdoors.com and use IBM Accelerator for Machine Data Analysis begins analyzing data.

Preparing data for a new log type

A batch of prepared e-mail data from Customersupport@sample outdoors.com is available in the Downloads section.

Perform the following steps.

Download the code_and_data.zip from the download section and unzip it.

You will see a directory named AQL. Put it in a convenient place. In this tutorial, EMAIL.AQL and EXTRACTOR_EMAIL.AQL are used later.

You will also see a directory named Input_batches. The directory input_batches contains a batch file named Batch_inbox. Batch_inbox contains the e-mail data as shown in Listing 1. It represents the e-mail messages received in the Customer Support inbox for Sample Outdoors Company.

Listing 1. Email data

Message-id:<16159836.1075855377439.javamail.evans@thyme> Date:sat,
July 08:36:42-0800 (PST) from:john.doe@gmail.com to:
customersupport@sampleoutdoors.com Subject:FW:Cannot Purchase
mime-version:1.0 Content-type:text/plain; Charset=us-ascii
Content-transfer-encoding:7bit X-from:john Doe X-to:customersupport
X-cc:x-bcc:x-folder:\customersupport_july2012\notes Folders\inbox
X-origin:customersupport x-filename:customersupport.nsf Hi I am still
Not able to purchase items on Sample outdoors.i urgently need to get
These items. John-----Original Message-----from:doe, John.
Sent:saturday, July 4:06 PM to:
customersupport@Sampleoutdoors.com subject:cannot purchase Hello, I am
Has trouble purchasing items on your website. Is there a known issue,
Any estimate in when it'll be fixed? John, HI.
Message-id:<13556517.1075852726971.javamail.evans@thyme> Date:sat,
July 08:59:02-0700 (PDT) from:mary.jane@yahoo.com to:
websupport@sampleoutdoors.com Subject:problem with purchases Cc:
customersupport@sampleoutdoors.com mime-version:1.0 Content-type:
Text/plain; Charset=us-ascii Content-transfer-encoding:7bit X-from:mary
Jane X-to:websupport X-cc:customersupport X-BCC:
X-folder:\websupport_july2012\notes Folders\inbox X-origin:websupport
X-FILENAME:WEBSUPPORT.NSF Hi I am unable to purchase on your
Website. Help!!! Mary

See more highlights of this column: http://www.bianceng.cnhttp://www.bianceng.cn/Programming/extra/

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.