Apache Zeppelin Installation and introduction

Last Update:2015-05-23 Source: Internet

Author: User

Tags install node databricks

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Background

Apache Zeppelin provides a web version of a similar Ipython notebook for data analysis and visualization. The back can be connected to different data processing engines, including Spark, Hive, Tajo, native support Scala, Java, Shell, Markdown and so on. Its overall presentation and use form is the same as the Databricks cloud, which comes from the demo at the time.

Install on Mac OS

Currently on GitHub, the Zeppelin version is 0.5.0, with no pre-compiled packages available for download. Installation Documentation: http://zeppelin.incubator.apache.org/docs/install/install.html

Other components are good to install, direct MVN install is no problem.
The only thing I'm not familiar with when I install it is the Zeppelin-web project, which uses node, grunt, Bower These front-section tools.

My experience is to modify the pom.xml of the Zeppelin-web project and go through this part of the script alone,

<plugin>        <groupId>Com.github.eirslett</groupId>        <artifactid>Frontend-maven-plugin</artifactid>        <version>0.0.23</version>        <executions>          <execution>            <ID>Install node and NPM</ID>            <goals>              <goal>install-node-and-npm</goal>            </goals>            <configuration>              <nodeversion>v0.10.18</nodeversion>              <npmversion>1.3.8</npmversion>            </configuration>          </Execution>          <execution>            <ID>NPM Install</ID>            <goals>              <goal>Npm</goal>            </goals>          </Execution>          <execution>            <ID>Bower Install</ID>            <goals>                <goal>Bower</goal>            </goals>            <configuration>              <arguments>--allow-root Install</arguments>            </configuration>          </Execution>          <execution>            <ID>Grunt Build</ID>            <goals>                <goal>Grunt</goal>            </goals>            <configuration>              <arguments>--no-color--force</arguments>            </configuration>          </Execution>        </executions>      </plugin>

Installation Order:
1. First of all, you need to install NPM and node in advance. Brew Install NPM and NPM install-g node.
2. Go to the Zeppelin-web directory and perform NPM install. It installs some grunt components according to Package.json's description, installs the Bower, and then produces a node_modules directory under the catalog.
3. Execute bower–alow-root Install, depending on the Bower.json installation of the pre-Library, a bit similar to the Java MVN. See http://bower.io/
4. Execute Grunt–force, the Web file will be organized according to Gruntfile.js.
5. It is best to execute MVN install-dskiptests, package the Web project, and generate a war under the target directory.

MVN may be wrong, because Web. XML is not in the default path and needs to be added in Pom.xml:

<plugin>        <groupId>org.apache.maven.plugins</groupId>        <artifactId>maven-war-plugin</artifactId>        <configuration>          <webXml>app\WEB-INF\web.xml</webXml>        </configuration></plugin>

Start

After installation, in the Zeppelin parent directory, modify the Conf folder zeppelin-env.sh and Zeppelin-site.xml, can be the default configuration, but to the original two files invalid suffix removed.

Executed under the Zeppelin parent directory

start

You will be able to access the Zepplin home page in localhost:8080. If you do not have a home page, you can look at the browser console, is missing any files, probably is the Web project when the package is missing, it is likely that the Bower and grunt command execution when the lack of dependency error.

The main interface is

Zeppelin the parent directory, you will see a notebook folder, which is named after the notebook name to differentiate multiple subdirectories. The directory is a Note.json file that records the code entered in each notebook and executes the result, which is loaded when it is started.

Function

Enter the notebook on the page to write the Scala code directly,

By identifying%md,%sh,%sql,%spark,%hive,%tajo to distinguish what to do, the default is not to write, the execution environment is Scala. Detailed parameter descriptions are available on the Http://127.0.0.1:8080/#/interpreter page.

I also tried a few:

Very cool.

In addition, Zeppelin has done visual work for Spark SQL and entered tutorial notebook, which has already written examples. Since the local compilation does not specify the spark version, the previous version of 1.3 did not use the Todf method, but instead Toschemardd, change the following paragraph:

ImportSys.process._//SC is an existing sparkcontext.ValSqlContext =NewOrg.apache.spark.sql.SQLContext (SC)ValZeppelinhome = ("PWD"!!). Replace"\ n","")ValBanktext = Sc.textfile (s"$zeppelinHome/data/bank-full.csv")Case   class Bank(Age:integer, job:string, marital:string, education:string, Balance:integer) ValBank = Banktext.map (s = = S.split (";"). filter (S + = s) (0) !="\" age\ ""). Map (s = + Bank (s) (0). ToInt, S (1). ReplaceAll ("\"",""), S (2). ReplaceAll ("\"",""), S (3). ReplaceAll ("\"",""), S (5). ReplaceAll ("\"",""). ToInt). Toschemarddbank.registertemptable ("bank")

After generating the bank table, you can do the query, the effect is as follows:

There are histograms, pie charts, scatter charts, line charts, and so on.

Note :

Here ${maxage=} is written in the same way as the demo in Databricks Cloud. The example at that time was the input soccer, which then trained and exported the football-related tweets in real time (the value FIFA tournament period).
This representation, Zeppelin, is called the Dynamic Form:
Http://zeppelin.incubator.apache.org/docs/dynamicform.html

Summarize

Apache Zeppelin should be very attractive to the distributed computing, data analysis practitioners, the code is small, the module is very clear, you can try to access different computing engines, try to run the task, visualization, is worth playing with the more avant-garde project.

There are not too many complex operations, just a distinction between multiple notebook, each of which does separate analytical processing, and the process and results are preserved. In addition, better support for spark, such as the default is the Scala environment, the default SC has been created, that is, spark local can run, the default Spark SQL has a visual effect.

Full text end:)

Apache Zeppelin Installation and introduction

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More