Apache Zeppelin installation and introduction, apachezeppelin Installation

Last Update:2015-05-24 Source: Internet

Author: User

Tags install node databricks

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Apache Zeppelin installation and introduction, apachezeppelin Installation
Background

Apache Zeppelin provides a web version similar to ipython notebook for data analysis and visualization. You can access different Data Processing engines, including spark, hive, and tajo. native support for scala, java, shell, and markdown. Its overall presentation and usage form are the same as those of Databricks Cloud, that is, the demo at that time.

Install on Mac OS

Currently, the zeppelin version is 0.5.0 on github, and no pre-compiled package is used for download. Installation documentation: http://zeppelin.incubator.apache.org/docs/install/install.html

Other components are well installed. It is okay to directly install mvn.
The only thing I was not familiar with during installation was the zeppelin-web project, which used node, grunt, and bower tools.

In my experience, modify the pom. xml file of the zeppelin-web project and separate the scripts,

<plugin>        <groupId>com.github.eirslett</groupId>        <artifactId>frontend-maven-plugin</artifactId>        <version>0.0.23</version>        <executions>          <execution>            <id>install node and npm</id>            <goals>              <goal>install-node-and-npm</goal>            </goals>            <configuration>              <nodeVersion>v0.10.18</nodeVersion>              <npmVersion>1.3.8</npmVersion>            </configuration>          </execution>          <execution>            <id>npm install</id>            <goals>              <goal>npm</goal>            </goals>          </execution>          <execution>            <id>bower install</id>            <goals>                <goal>bower</goal>            </goals>            <configuration>              <arguments>--allow-root install</arguments>            </configuration>          </execution>          <execution>            <id>grunt build</id>            <goals>                <goal>grunt</goal>            </goals>            <configuration>              <arguments>--no-color --force</arguments>            </configuration>          </execution>        </executions>      </plugin>

Installation sequence:
1. Install npm and node in advance. Brew install npm and npm install-g node.
2. Go to the zeppelin-web directory and run npm install. It will install some grunt components according to the description of package. json, install bower, and then produce a node_modules directory under the directory.
3. Run bower-alow-root install to install the previous library dependency based on bower. json, which is somewhat similar to java mvn. See http://bower.io/
4. Execute grunt-force to sort web files according to Gruntfile. js.
5. Run the mvn install-DskipTests command to package the web Project and generate a war under the target directory.

Mvn errors may occur because web. xml is not in the default path. You need to add the following in pom. xml:

<plugin>        <groupId>org.apache.maven.plugins</groupId>        <artifactId>maven-war-plugin</artifactId>        <configuration>          <webXml>app\WEB-INF\web.xml</webXml>        </configuration></plugin>

Start

After installation, under the zeppelin parent directory, modify the zeppelin-env.sh and zeppelin-site.xml In the conf folder, which can be the default configuration, but remove the original invalid Suffix of the two files.

Run the command in the zeppelin parent directory.

bin/zeppelin-daemon.sh start

You can access the zepplin homepage at localhost: 8080. If you do not go out of the homepage, you can check the browser console. What files are missing? The web project package may be missing, it is likely that the dependency error occurs when the bower and grunt commands are executed.

The main interface is

A notebook folder is displayed under the zeppelin parent directory. Multiple subdirectories are distinguished by the name of the notebook. The directory is a note. json file that records the entered code and execution results in each notebook. It will be loaded at startup.

Function

Go to the notebook on the page and write scala code directly,

Identify % md, % sh, % SQL, % spark, % hive, and % tajo to differentiate what is to be executed. If it is not written by default, the execution environment is scala. Detailed parameter descriptions are provided on the http: // 127.0.0.1: 8080/#/interpreter page.

I also tried a few:

Very cool.

In addition, zeppelin visualizes spark SQL and enters the tutorial notebook. The example has been written in it. Because the spark version is not specified for local compilation, the toDF method is not used in versions earlier than 1.3, but toSchemaRDD. Modify the following section:

import sys.process._// sc is an existing SparkContext.val sqlContext = new org.apache.spark.sql.SQLContext(sc)val zeppelinHome = ("pwd" !!).replace("\n", "")val bankText = sc.textFile(s"$zeppelinHome/data/bank-full.csv")case class Bank(age: Integer, job: String, marital: String, education: String, balance: Integer)val bank = bankText.map(s => s.split(";")).filter(s => s(0) != "\"age\"").map(    s => Bank(s(0).toInt,             s(1).replaceAll("\"", ""),            s(2).replaceAll("\"", ""),            s(3).replaceAll("\"", ""),            s(5).replaceAll("\"", "").toInt        )).toSchemaRDDbank.registerTempTable("bank")

After the bank table is generated, you can perform the following query:

Histogram, pie chart, scatter chart, line chart, and so on.

Note::

Here, $ {maxAge =} is written in the same way as the demo in databricks cloud. In that example, enter soccer and then train and output football-related tweets in real time in the lower part ).
This representation is called Zeppelin.Dynamic Form:
Http://zeppelin.incubator.apache.org/docs/dynamicform.html

Summary

Apache zeppelin should be very attractive to distributed computing and data analysis practitioners, with a small amount of code and a clear module. You can try to access different computing engines and try task running and visualization effects, it is a relatively cutting-edge project worthy of play.

There are not many complicated operations, but multiple notebooks are differentiated. Each notebook performs separate analysis and processing, and the process and results are saved. In addition, it provides better support for spark, such as the scala environment by default. By default, SC has been created, that is, spark local can be run, and spark SQL has a visual effect by default.

Full Text :)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More