Hunk/Hadoop: Best Performance practices

Last Update:2015-10-12 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Whether or not Hunk is used, there are many ways to run Hadoop that cause occasional performance. Most of the time, people add more hardware to solve the problem, but sometimes the problem can be solved simply by changing the file name.

Run the Map-Reduce task [Hunk]

Hunk runs on Hadoop, but this does not necessarily mean effective use. If Hunk runs in "complex mode" instead of "intelligent mode", it will not actually use Map-Reduce. Instead, it will directly pull all Hadoop data to the Splunk engine and process it there.

HDFS storage [hadoop]

How to deploy files when many items of Hadoop are associated with Hunk? You need to include the timestamp in the file path. Hunk can use the directory structure as a filter, which can greatly reduce the volume of data pulled to Splunk.

The timestamp in the file name can also take effect, but the effect is poor, because Hunk still reads all file names.

For better performance, you can include a key-value pair in the file path. For example, /2015/3/2/app = webserver /..." . When traversing the directory, the query command filters out key-value pairs, reducing the data volume pulled to Splunk again.

Timestamp-based VIX/indexs. conf [hunk]

When the file storage mode is applicable to any Hadoop Map-Reduce, You need to modify indexs. conf so that Hunk can recognize the directory structure.

File Format [Hunk]

Self-describing files such as JSON and CSV can be easily read by Hunk. They are more detailed and eliminate costly ing operations.

Compression type/File Size [Hdaoop]

Avoid too large files, such as files with mb gz compression and no fragments. (For example, LZO compressed multipart files are acceptable .) For files without sharding, there is a one-to-one ing relationship between the core and the file, which means that only one core can be used to handle large files, while other fixed cores can only be idling and waiting. That is to say, it takes a lot of time to process files without sharding, so the Map-Reduce task cannot be processed quickly.

Similarly, you should avoid using a large number of broken files ranging from dozens of KB to hundreds of KB. If the file is too small, you will spend a lot of time starting and managing tasks, rather than actually processing data.

Report acceleration [hunk]

Hunk can now use the report acceleration feature of Splunk to cache search results in HDFS, reducing or eliminating the need to read data from the master Hadoop cluster.

Before you enable this function, make sure that your Hadoop cluster has enough space to store cache.

Hardware [Hadoop]

Make sure you have the right hardware. Although Hadoop can run on or even a dual-core laptop, to use it, you still need to have at least four CPUs for each node, to ensure sufficient space for temporary storage, you must configure at least 12 GB of memory, two local disks (10 K or solid state)

Search for Head Clustering [Hunk]

Search Head Clustering is a relatively new feature in Splunk 6.2. In Splunk6.3, Hunk-based queries are a feasible attribute.

You may also like the following articles about Hadoop:

Tutorial on standalone/pseudo-distributed installation and configuration of Hadoop2.4.1 under Ubuntu14.04

Install and configure Hadoop2.2.0 on CentOS

Build a Hadoop environment on Ubuntu 13.04

Cluster configuration for Ubuntu 12.10 + Hadoop 1.2.1

Build a Hadoop environment on Ubuntu (standalone mode + pseudo Distribution Mode)

Configuration of Hadoop environment in Ubuntu

Detailed tutorial on creating a Hadoop environment for standalone Edition

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Hunk/Hadoop: Best Performance practices

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Hunk/Hadoop: Best Performance practices

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support