Zeppelin is a web-note based spark large data interactive data query analysis tool (like the Python notebook) that can write Scala and SQL code online to query and analyze data and generate reports. Developers can also add data engines to the Zeppelin by implementing more interpreters.
0, Download Zeppelin
Download Address: https://zeppelin.incubator.apache.org/download.html
Select compiled version:
Extract directory structure:
1, modify conf/zeppelin-env.sh, set Spark_home, Hadoop_home (first copy template)
Export Spark_home= $SPARK _home
Exporthadoop_home= $HADOOP _conf_dir
(Set to Fortress machine. BASHRC)
2, modify the conf/zeppelin-site.sh, set the running port (first copy template)
<property>
<name>zeppelin.server.port</name>
<value>8097</value>
<description>server port.</description>
</property>
3, modify Conf/interpreter.json, locate spark configuration items, modify spark configuration parameters
4. Modify Bin/interpreter.sh
Removal parameters:--driver-class-path "${zeppelin_classpath_overrides}:${classpath}"
(--driver-class-path must remove the parameter, or it will be an error)
(Error:found both Spark.driver.extraClassPath and Spark_classpath.) Use only the former.)
and increase the run queue parameters:--queue [queue name]
5. Start/Stop Zeppelin
Start command: bin/zeppelin-daemon.sh start
Stop command: bin/zeppelin-daemon.sh start
Reboot command: bin/zeppelin-daemon.sh restart
6, Access page (Ip:port), run the sample program
Eg. http://172.22.170.128:8097
7. Other
Zeppelin is positioned in an interactive visual analysis environment, while yarn-client mode is suitable for interaction and debugging Zeppelin currently does not support Yarn-cluster mode operation