IBM Accelerator for Machine Data Analytics (iii) speed up machine data search

Last Update:2017-02-27 Source: Internet

Author: User

Tags file copy mail oracle database

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Before you start

About this series

One of the main advantages and strengths of IBM Accelerator for Machine Data Analytics is the ability to easily configure and customize the tool. This series of articles and tutorials is intended for readers who want to get a sense of the accelerator, further speed up machine data analysis, and want to gain customized insights.

About this tutorial

In the 1th part of this series, we explored some known logs and some little-known logs. In part 2nd of this series, a new log type is created to analyze the new data type. In this tutorial, you will see how to use the new e-mail log type as Plug and play, just as you would use the Open box log type and the generic type. You will also get the consolidated view of all these logs and the ability to search for them.

If you are not interested in the new log type, you can learn how to search using the out-of-the-box log type and generics.

Goal

In this tutorial, you will learn how to do the following tasks.

Use the Out-of-the-box log type in indexes and searches.

Use Plug and Play custom log types in indexes and searches.

Observe how to automatically discover certain facet for out-of-the-box and custom log types.

Configure the index and search to match the use case.

You will also learn how to use the application chains that came with the accelerator.

Prerequisite

Read the 1th part of this series: Speed up machine data analysis and get an overview of IBM Accelerator for Machine data Analytics. If you are interested in learning how to customize the accelerator for the new log type, you can choose to complete part 2nd of this series: speed up the analysis of the new log types.

System Requirements

In order to run the examples in this tutorial, you need to meet the following criteria.

Biginsights v2.0 has been installed.

IBM Accelerator for Machine Data Analytics has been installed.

Provides data sets for machine data analysis. For links to download data, see the Downloads section.

View and search consolidated across all logs

Machine data has many shapes and sizes. Some types of data formats follow a known structure or format, while other types of data provide a fully customizable format. Some types are semi-structured or unstructured, while others are structured.

Putting all data types together in a consolidated search view provides obvious benefits for all types of analysis. While some machine logs can provide information about the behavior of an application, combining it with unstructured information such as e-mail can help provide an operational analysis. Further combining it with structured information to configure a file or report from an external system results in a searchable information gold mine.

In the 1th part of this series: Speeding up machine data analysis, you see the diversity of machine data across the application layer. In part 2nd of this series: speeding up the analysis of new log types, you see how you can easily add external information such as e-mail for analysis.

In this tutorial, all of this data is placed in a searchable repository.

The case of a fictitious Sample outdoors company

Sample Outdoors wants to get a consolidated view of all its log data. In addition, they want to start creating searchable information gold mines by adding e-mail data. The next task for Sample Outdoors Company is to put all this information together and search for it.

10 features for searching any machine data

Read the overview and highlights of the IBM Accelerator for Machine data Analytics features below, which can be used to search for any machine data.

Use the Import-extract chain to introduce data and extract data.

Create a consolidated, searchable repository of all the logs.

Adds a custom log type to the repository, using the new log type Plug and play in the search.

Prepare the search and observe the automatic discovery of the facet.

Observe the chronological view of events, including e-mail.

Perform a search.

Displays only facet that make sense to the use case. In configuring the User interface section for use cases, learn how to do this.

Add any missing fields for the custom log type by adding the field section of the custom log type by configuring the index.

Finally, learn how to optimize the size of the index in optimization! Incorporating the configuration into the index section creates only the necessary facet.

View the generated Search interface.

At Sample Outdoors Co.

The data scientists at Sample Outdoors have the following machine data from customers.

Customerfrontend application– applications based on Apache Web Access

Customerbackend applications – Applications based on IBM WebSphere servers

Customerdbapp Application –oracle Database application

They also send e-mail to customersupport@sampleoutdoors.com and websupport@sampleoutdoors.com.

They want to put all this information together and start building a consolidated searchable repository.

Introducing data and extracting data

In this section, you introduce machine data from the application stack into the repository. See part 1th: Speed up machine data analysis For more information about applications and their logs.

The download section provides a pre prepared log batch from the application stack.

Perform the following steps.

Download the data_and_config.zip from the download section and unzip it.

Copy the data/input_batches to a single machine in the Biginsights cluster. For this tutorial, the location used is/opt/ibm/input_batches. You can modify it to another preferred location. Note The directory structure that contains the batch. The input_batches contains the following three batches representing the three tiers of the application stack.

batch_webaccess– contains logs from the Web Access layer.

batch_was– contains logs from the WebSphere application.

batch_oradb– contains logs from the Oracle database tier.

You will use the Import-extract application chain to perform the Import and extraction steps. Because the Import application uses the distributed copy application, first make sure that the distributed copy application is deployed.

In the Biginsights console, click the Applications tab and select the Manage link.

Enter distributed in the edit box. As you can see, distributed File Copy is listed under applications.

If the application state is not_deployed, click the Deploy button, as shown in Figure 1.

Figure 1. Deploying Distributed Copy applications

See more highlights of this column: http://www.bianceng.cnhttp://www.bianceng.cn/Programming/extra/

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More