Under Ubuntu based SAPRK installation Zeppelin

Last Update:2016-11-03 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Objective

Apache Zeppelin is a Web-based notebook (similar to Ipython notebook) that supports interactive data analysis, an interactive data query analysis tool in the form of Web notes. You can use Scala and SQL online to query and analyze data and generate reports. Native support for Spark, Scala, SQL, Shell ,markdown, and more. And it is fully open source, and is still in the Apache incubation stage. It has been used in major companies, such as the United States, Microsoft and so on.

Zeppelin's background data engine can be spark, or you can add a data engine for Zeppelin by implementing more interpreters. Building a Zeppelin locally makes spark easier to use, and it's easy to showcase your work to customers.

Prepare

sudo apt-get update    // update apt

Installing the JDK

sudo apt-getopenjdk-8-jre openjdk-8-jdk

installationHadoopInstall SparkInstall git

sudo apt-get install git

Installing Maven

sudo apt-get install maven

Installing NPM

sudo apt-get install NPM //npm home:/usr/share/npm

Installing PHANTOMJS

Download "phantomjs-1.9.8-linux-x86_64.tar.bz2"
extract to :/usr/local/phantomjs

Installing Apache Zeppelin

SOURCE download

Https://github.com/apache/incubator-zeppelinhttp://zeppelin.apache.org/download.html

Unzip the installation

Apache Zeppelin officially provides the source package and binary packages that we can download as needed to install the relevant packages.

By downloading Zeppelin's binary package:http://ftp.meisei-u.ac.jp/mirror/apache/dist/incubator/zeppelin/0.5.6-incubating/ Zeppelin-0.5.6-incubating-bin-all.tgz, and then unzip the installation.
```
TAR-XZVF zeppelin-0.5. 6-incubating-bin-all.tgz
```
By compiling the source code to install Apache Zeppelin, I am here to download the latest source code from the Zeppelin git repository to compile.
```
$ git clone https:/ /// download latest, unzip to:/usr/local/zeppelin
```

Compiling Apache Zeppelin

Local mode: mvn clean package-dskiptests
Cluster mode: MVN package-pspark-2.0-dhadoop.version=2.7.1-phadoop-2.7-dskiptests-x

In the installation process may have a variety of problems, but it is usually caused by network problems, re-execute the following compile command. However, if you are compiling an oom, you need to add the following command:

Export maven_opts="-xmx2g-xx:maxpermsize=512m-xx:reservedcodecachesize=512m"

Configure environment Variables

 [email protected]:~$ vim. BASHRC 
 
 //vim edit path  
 Export Java_home=/usr/lib/jvm/java-8 -openjdk- Amd64export spark_home  =/usr/local/sparkexport hadoop_home  =/usr/local/ hadoopexport phantomjs_home  =/usr/local/ Phantomjsexport zeppelin_home  =/usr/local/zeppelinexport PATH  =.: $PATH :/usr/local/hadoop/bin:/usr/local/phantomjs/bin:/usr/local/spark/bin:/usr/local/zeppelin/bin:/usr/lib/jvm/ Java-8 -openjdk-amd64/bin; 
 
 [email protected]:~$ source. BASHRC

Cluster Mode compilation

[Email protected]:~$ cd/usr/local/zeppelin[email protected]:/usr/local/zeppelin$ mvn package-pspark-  2.0 -dhadoop.version=2.7. 1 -phadoop-2.7 -dskiptests-x

If you need to use yarn, you must specify the-pyarn option when compiling Zeppelin.

Configuration

The configuration file is the environment variable file (conf/zeppelin-env.sh) and the Java Properties File (Conf/zeppelin-site.xml). Configure according to your requirements.

Copy/usr/local/zeppelin/conf/zeppelin-env.sh.template and/usr/local/zeppelin/conf/zeppelin-site.xml.template to/usr/ Local/zeppelin/conf/zeppelin-env.sh and/usr/local/zeppelin/conf/zeppelin-site.xml.

Edit conf/zeppelin-env.sh

Export java_home=/usr/lib/jvm/java-8-openjdk-amd64export spark_home=/usr/local/sparkexport Hadoop_conf_dir=/usr/local/hadoopexport spark_submit_options="--packages com.databricks:spark-csv_2.10:1.2.0"

Start

Execute the following command in the Zeppelin_home directory:

[Email protected]:/usr/local/zeppelin$./bin/zeppelin-daemon.sh start

Its start/stop command: bin/zeppelin-daemon.sh start/stop.

After startup, open localhost:8080 to access the Zepplin home page.

Test

Configuring the Spark Interpreter

Create a note

Zeppelin Getting Started with

1.text
The text content is output by default in the Scala language:

println ("Hello Yuan siping! ")

2.html

Shell Output HTML:

" %html  "

3.table

Scala:

Print (S"" "%table name\tsize\nsun\t100\nmoon\t10""")

4.Tutorial with Local File

Data Refine:
Download Bank data: http://archive.ics.uci.edu/ml/machine-learning-databases/00222/bank.zip, convert CSV format data to the Bank object Rdd, and filter the header column:

Val Banktext = Sc.textfile ("/usr/data/bank/bank-full.csv") Case classBank (Age:integer, Job:string, marital:string, Education:string, Balance:integer) Val Bank= Banktext.map (S=>s.split (";"). Filter (S=>s (0)!="\ "Age\""). Map (S=>bank (S (0). ToInt, S (1). ReplaceAll ("\"",""), S (2). ReplaceAll ("\"",""), S (3). ReplaceAll ("\"",""), S (5). ReplaceAll ("\"","") . ToInt) bank.todf (). Registertemptable ("Bank")

Data Retrieval:
You can see the age distribution by executing the following statement:

Select Age, Count (1fromwhere the group by age

Dynamic Input MaxAge parameter (default is 30 years old), view age distribution less than maxage years old:

Select Age, Count (1fromwhere < ${maxage=)GROUP by age

Depending on the marital status option, check the age distribution status:

Select Age, Count (1fromwhere marital="${marital=single,single| divorced|married}" GROUP By Age"

Under Ubuntu based SAPRK installation Zeppelin

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Under Ubuntu based SAPRK installation Zeppelin

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Under Ubuntu based SAPRK installation Zeppelin

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support