Background
Apache Zeppelin provides a web version of a similar Ipython notebook for data analysis and visualization. The back can be connected to different data processing engines, including Spark, Hive, Tajo, native support Scala, Java, Shell, Markdown and so on. Its overall presentation and use form is the same as the Databricks cloud, which comes from the demo at the time.
Install on Mac OS
Currently on GitHub, the Zeppelin version is 0.5.0, with no pre-compiled packages available for download. Installation Documentation: http://zeppelin.incubator.apache.org/docs/install/install.html
Other components are good to install, direct MVN install is no problem.
The only thing I'm not familiar with when I install it is the Zeppelin-web project, which uses node, grunt, Bower These front-section tools.
My experience is to modify the pom.xml of the Zeppelin-web project and go through this part of the script alone,
<plugin> <groupId>Com.github.eirslett</groupId> <artifactid>Frontend-maven-plugin</artifactid> <version>0.0.23</version> <executions> <execution> <ID>Install node and NPM</ID> <goals> <goal>install-node-and-npm</goal> </goals> <configuration> <nodeversion>v0.10.18</nodeversion> <npmversion>1.3.8</npmversion> </configuration> </Execution> <execution> <ID>NPM Install</ID> <goals> <goal>Npm</goal> </goals> </Execution> <execution> <ID>Bower Install</ID> <goals> <goal>Bower</goal> </goals> <configuration> <arguments>--allow-root Install</arguments> </configuration> </Execution> <execution> <ID>Grunt Build</ID> <goals> <goal>Grunt</goal> </goals> <configuration> <arguments>--no-color--force</arguments> </configuration> </Execution> </executions> </plugin>
Installation Order:
1. First of all, you need to install NPM and node in advance. Brew Install NPM and NPM install-g node.
2. Go to the Zeppelin-web directory and perform NPM install. It installs some grunt components according to Package.json's description, installs the Bower, and then produces a node_modules directory under the catalog.
3. Execute bower–alow-root Install, depending on the Bower.json installation of the pre-Library, a bit similar to the Java MVN. See http://bower.io/
4. Execute Grunt–force, the Web file will be organized according to Gruntfile.js.
5. It is best to execute MVN install-dskiptests, package the Web project, and generate a war under the target directory.
MVN may be wrong, because Web. XML is not in the default path and needs to be added in Pom.xml:
<plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-war-plugin</artifactId> <configuration> <webXml>app\WEB-INF\web.xml</webXml> </configuration></plugin>
Start
After installation, in the Zeppelin parent directory, modify the Conf folder zeppelin-env.sh and Zeppelin-site.xml, can be the default configuration, but to the original two files invalid suffix removed.
Executed under the Zeppelin parent directory
start
You will be able to access the Zepplin home page in localhost:8080. If you do not have a home page, you can look at the browser console, is missing any files, probably is the Web project when the package is missing, it is likely that the Bower and grunt command execution when the lack of dependency error.
The main interface is
Zeppelin the parent directory, you will see a notebook folder, which is named after the notebook name to differentiate multiple subdirectories. The directory is a Note.json file that records the code entered in each notebook and executes the result, which is loaded when it is started.
Function
Enter the notebook on the page to write the Scala code directly,
By identifying%md,%sh,%sql,%spark,%hive,%tajo to distinguish what to do, the default is not to write, the execution environment is Scala. Detailed parameter descriptions are available on the Http://127.0.0.1:8080/#/interpreter page.
I also tried a few:
Very cool.
In addition, Zeppelin has done visual work for Spark SQL and entered tutorial notebook, which has already written examples. Since the local compilation does not specify the spark version, the previous version of 1.3 did not use the Todf method, but instead Toschemardd, change the following paragraph:
ImportSys.process._//SC is an existing sparkcontext.ValSqlContext =NewOrg.apache.spark.sql.SQLContext (SC)ValZeppelinhome = ("PWD"!!). Replace"\ n","")ValBanktext = Sc.textfile (s"$zeppelinHome/data/bank-full.csv")Case class Bank(Age:integer, job:string, marital:string, education:string, Balance:integer) ValBank = Banktext.map (s = = S.split (";"). filter (S + = s) (0) !="\" age\ ""). Map (s = + Bank (s) (0). ToInt, S (1). ReplaceAll ("\"",""), S (2). ReplaceAll ("\"",""), S (3). ReplaceAll ("\"",""), S (5). ReplaceAll ("\"",""). ToInt). Toschemarddbank.registertemptable ("bank")
After generating the bank table, you can do the query, the effect is as follows:
There are histograms, pie charts, scatter charts, line charts, and so on.
Note :
Here ${maxage=} is written in the same way as the demo in Databricks Cloud. The example at that time was the input soccer, which then trained and exported the football-related tweets in real time (the value FIFA tournament period).
This representation, Zeppelin, is called the Dynamic Form:
Http://zeppelin.incubator.apache.org/docs/dynamicform.html
Summarize
Apache Zeppelin should be very attractive to the distributed computing, data analysis practitioners, the code is small, the module is very clear, you can try to access different computing engines, try to run the task, visualization, is worth playing with the more avant-garde project.
There are not too many complex operations, just a distinction between multiple notebook, each of which does separate analytical processing, and the process and results are preserved. In addition, better support for spark, such as the default is the Scala environment, the default SC has been created, that is, spark local can run, the default Spark SQL has a visual effect.
Full text end:)
Apache Zeppelin Installation and introduction