1. Overview
When writing flink,spark,hive and other related jobs, it is exciting to be able to quickly visualize the work we have written in front of us, and it would be even better to bring the trend function. Today, I would like to introduce you to such a tool. It will be able to meet the above requirements, in the use of a period of time, here to share the following usage experience.
2.How to do
First, let's look at the background and purpose of this tool. Zeppelin is currently hosted at the Apache Foundation but is not listed as a top-level project and can be accessed on its published website. It provides a very friendly WebUI interface, operation related instructions. It can be used for analysis and visualization of data. The latter can be connected to different data processing engines. including flink,spark,hive and so on. Support native Scala,shell,markdown and so on.
2.1 Install
For Zeppelin, it is not dependent on the Hadoop cluster environment and we can deploy it to a separate node for use. First we use the following address to obtain the installation package:
http://zeppelin.incubator.apache.org/download.html
Here, there are 2 options, one, you can download the original file, self-compiled installation. Second, download the binaries directly for installation. Here, for convenience, I use binary files directly to install and use. Here are some parameters need to be configured, in order to ensure the normal start of the system, ensure that the port of the Zeppelin.server.port property is not occupied, the default is 8080, other properties can be configured on demand. [Configuration link]
2.2 Start/stop
After you complete the above steps, start the corresponding process. Navigate to the Bin folder of the Zeppelin installation directory and start the process using the following command:
./zeppelin-daemon. SH start
If you need to stop, you can stop the process using the following command:
./zeppelin-daemon. SH stop
In addition, by reading the contents of the zeppelin-daemon.sh script, you can find that we can also use related reboots to view commands such as status. The contents are as follows:
Case " ${1} " inch start) start ; Stop) stop ;; Reload) stop start ;; Restart) stop start ;; Status) find_zeppelin_process ;; *) echo ${usage}
3.How to use
After you start the related process, you can access it in the browser using the following address:
http://<Your_<IP/Host>:P ort>
The interface after startup is as follows:
The interface lists the plug-in bindings. In the spark,md,sh and so on. So how do I use these to do some work. When using some data engine, such as flink,spark,hive, it is necessary to configure the corresponding connection information. Configure at the Interpreter bar. Here are some examples of configuration:
3.1 Flink
You can find the configuration entries for Flink, as shown in:
Then specify the corresponding IP and address.
3.2 Hive
Here the Hive configuration needs to point to its Thrift service address as shown in:
In addition, other plug-ins, such as Spark,kylin,phoenix configuration similar, after the configuration is complete, remember to click the "Restart" button.
3.3 Use MD and SH
Below, we can create a Notebook to use, we take the simplest Shell and Markdown to demonstrate, as shown in:
3.4 SQL
Of course, our goal is not just to use the Shell and Markdown, we need to be able to use SQL to get the results we want.
3.4.1 Spark SQL
Below, we use Spark SQL to get the desired results. As shown in the following:
Here, the results can be visualized, quantified, trending, and at a glance in different forms.
3.4.2 Hive SQL
In addition, you can use dynamic format to query the partition data, in the format of "${partition_col=20160101,20160102|20160103|20160104|20160105|20160106}" to represent. As shown in the following:
3.5 Video Guide
In addition, the official also gave a quick guide to the introductory video, view address: [Entrance]
4. Summary
In the process of use, some places need to be aware that when you write Hive SQL,%HQL needs to be replaced with the%hive.sql format, and when you run Scala code, the following exception appears, as shown in:
Solution, add the following in the zeppelin-env.sh file:
Export ZEPPELIN_MEM=-XMX4G
This BUG was fixed in version 0.5.6, reference code: [ZEPPELIN-305]
5. Concluding remarks
This blog is to share with you here, if you study in the process of learning what is the problem, you can add groups to discuss or send e-mail to me, I will do my best to answer for you, with June encouragement!
Hadoop-zeppelin Usage Experience