Zeppelin is a web-based Big Data Interactive data query analysis tool (similar to Python notebook) that can be used to write Scala and SQL code to query and analyze data and generate reports. Developers can also add a data engine for Zeppelin by implementing more interpreters.
0. Download Zeppelin
: https://zeppelin.incubator.apache.org/download.html
Select the compiled version:
Unzip the directory structure:
1, Modify the conf/zeppelin-env.sh, set Spark_home, Hadoop_home( copy template first )
Export Spark_home= $SPARK _home
Exporthadoop_home= $HADOOP _conf_dir
(Set as Bastion machine. BASHRC parameters)
2, modify the conf/zeppelin-site.sh, set the running port ( copy template first )
<property>
<name>zeppelin.server.port</name>
<value>8097</value>
<description>server port.</description>
</property>
3, modify the Conf/interpreter.json, locate the Spark configuration item, modify the Spark configuration parameters
4, modify the bin/interpreter.sh
Removal parameters:--driver-class-path "${zeppelin_classpath_overrides}:${classpath}"
( --driver-class-path must remove the parameter, otherwise it will error )
( Error : Found both Spark.driver.extraClassPath and Spark_classpath. use only the former. )
and increase the run queue parameter:--queue [queue name]
5. Start/Stop Zeppelin
Start command: bin/zeppelin-daemon.sh start
Stop command: bin/zeppelin-daemon.sh start
Restart command: bin/zeppelin-daemon.sh restart
6, Access page (Ip:port), run the sample program
Eg. http://172.22.170.128:8097
7. Other
Zeppelin is positioned in an interactive visual analysis environment, while the yarn-client mode is suitable for interaction and debugging Zeppelin currently does not support Yarn-cluster mode operation
Zeppelin Installation Documentation