Hadoop-zeppelin Usage Experience

Source: Internet
Author: User

1. Overview

When writing flink,spark,hive and other related jobs, it is exciting to be able to quickly visualize the work we have written in front of us, and it would be even better to bring the trend function. Today, I would like to introduce you to such a tool. It will be able to meet the above requirements, in the use of a period of time, here to share the following usage experience.

2.How to do

First, let's look at the background and purpose of this tool. Zeppelin is currently hosted at the Apache Foundation but is not listed as a top-level project and can be accessed on its published website. It provides a very friendly WebUI interface, operation related instructions. It can be used for analysis and visualization of data. The latter can be connected to different data processing engines. including flink,spark,hive and so on. Support native Scala,shell,markdown and so on.

2.1 Install

For Zeppelin, it is not dependent on the Hadoop cluster environment and we can deploy it to a separate node for use. First we use the following address to obtain the installation package:

http://zeppelin.incubator.apache.org/download.html

Here, there are 2 options, one, you can download the original file, self-compiled installation. Second, download the binaries directly for installation. Here, for convenience, I use binary files directly to install and use. Here are some parameters need to be configured, in order to ensure the normal start of the system, ensure that the port of the Zeppelin.server.port property is not occupied, the default is 8080, other properties can be configured on demand. [Configuration link]

2.2 Start/stop

After you complete the above steps, start the corresponding process. Navigate to the Bin folder of the Zeppelin installation directory and start the process using the following command:

./zeppelin-daemon. SH start

If you need to stop, you can stop the process using the following command:

./zeppelin-daemon. SH stop

In addition, by reading the contents of the zeppelin-daemon.sh script, you can find that we can also use related reboots to view commands such as status. The contents are as follows:

 Case " ${1} " inch   start)    start    ;  Stop)    stop    ;;  Reload)    stop    start    ;;  Restart)    stop    start    ;;  Status)    find_zeppelin_process    ;;   *)    echo ${usage}
3.How to use

After you start the related process, you can access it in the browser using the following address:

http://<Your_<IP/Host>:P ort>

The interface after startup is as follows:

The interface lists the plug-in bindings. In the spark,md,sh and so on. So how do I use these to do some work. When using some data engine, such as flink,spark,hive, it is necessary to configure the corresponding connection information. Configure at the Interpreter bar. Here are some examples of configuration:

3.1 Flink

You can find the configuration entries for Flink, as shown in:

Then specify the corresponding IP and address.

3.2 Hive

Here the Hive configuration needs to point to its Thrift service address as shown in:

In addition, other plug-ins, such as Spark,kylin,phoenix configuration similar, after the configuration is complete, remember to click the "Restart" button.

3.3 Use MD and SH

Below, we can create a Notebook to use, we take the simplest Shell and Markdown to demonstrate, as shown in:

3.4 SQL

Of course, our goal is not just to use the Shell and Markdown, we need to be able to use SQL to get the results we want.

3.4.1 Spark SQL

Below, we use Spark SQL to get the desired results. As shown in the following:

Here, the results can be visualized, quantified, trending, and at a glance in different forms.

3.4.2 Hive SQL

In addition, you can use dynamic format to query the partition data, in the format of "${partition_col=20160101,20160102|20160103|20160104|20160105|20160106}" to represent. As shown in the following:

3.5 Video Guide

In addition, the official also gave a quick guide to the introductory video, view address: [Entrance]

4. Summary

In the process of use, some places need to be aware that when you write Hive SQL,%HQL needs to be replaced with the%hive.sql format, and when you run Scala code, the following exception appears, as shown in:

Solution, add the following in the zeppelin-env.sh file:

Export ZEPPELIN_MEM=-XMX4G

This BUG was fixed in version 0.5.6, reference code: [ZEPPELIN-305]

5. Concluding remarks

This blog is to share with you here, if you study in the process of learning what is the problem, you can add groups to discuss or send e-mail to me, I will do my best to answer for you, with June encouragement!

Hadoop-zeppelin Usage Experience

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.