Big Data Analysis: Using Hunk and hadoopmapreduce in combination with Hadoop or Elastic MapReduce

Source: Internet
Author: User
Tags hadoop mapreduce

Big Data Analysis: Using Hunk and hadoopmapreduce in combination with Hadoop or Elastic MapReduce

Jonathan Allen, author


Hunk is a new product of Splunk. It is used to test and visualize Hadoop and other NoSQL data storage. Its new version will support Amazon Elastic MapReduce.

Use Hunk in combination with Hadoop

Hadoop consists of two units, first known as HDFS storage units. HDFS can be distributed across thousands of replication nodes. Next, the MapReduce Unit is responsible for tracking and managing the jobs named map-reduce jobs.

Previously, developers used the Splunk Hadoop Connect (SHC) connector. SHC outputs data to Hadoop through the commonly used push model. This processing is quite good, but processing in the opposite direction may be problematic. When Splunk is used to detect data, the original data is absorbed into the Splunk Server for retrieval and processing. As people imagine, this process has not brought the advantages of Hadoop computing capabilities into full play.

Hunk solves this problem by providing an adapter that works with Hadoop MapReduce nodes. Splunk queries are converted into Hadoop MapReduce jobs. These jobs are processed in the Hadoop cluster, and only the results are retrieved back to the Splunk server for analysis and visualization.

In this way, Hunk provides an abstract layer so that users and developers do not need to care about how to write Hadoop MapReduce jobs. Hunk can also provide result preview before MapReduce job starts to reduce the number of useless searches.

Use Hunk in combination with Elastic MapReduce

Amazon Elastic MapReduce can be seen as a supplement to Hadoop and a competitor to Hadoop. EMR can run either on a Hadoop HDFS cluster or on AWS S3. Amazon claims that the advantage of AWS S3 is that it is easier to manage than HDFS clusters.

When running Elastic MapReduce, Hunk provides the same abstraction layer and preview functions, just as it does on Hadoop. From the user's point of view, switching between Hadoop and EMR will not cause any changes.

Hunk on the cloud

The traditional way to host Hunk on the cloud is to buy a Standard Edition license and deploy it to a virtual machine, which is as simple as on-site installation. Next, manually configure the running instance of Hunk to correspond to the correct Hadoop or AWS cluster.

In the new version of this month, the running instance of Hunk can be configured automatically on AWS, including the automatic discovery of EMR data sources, so that the Hunk instance can be used online within minutes. To make full use of this advantage, Hunk runs instances on an hourly basis.

Virtual Indexes)

A key concept in Hunk is "Virtual Indexes )". These indexes have different meanings, but they are just a way for Hunk to reflect Hadoop and EMR cluster processing. From the Splunk user interface, they are like real indexes, even if their data processing is completed in the map-reduce job. In addition, because it looks like an index, you can create a persistent secondary index (persistent secondary indexes) on them ). This persistent secondary index is useful when you want to process part of the data and further check or visualize it in multiple ways.

Big Data Analytics: Using Hunk with Hadoop and Elastic MapReduce


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.