Under Ubuntu based SAPRK installation Zeppelin

Source: Internet
Author: User

Objective

Apache Zeppelin is a Web-based notebook (similar to Ipython notebook) that supports interactive data analysis, an interactive data query analysis tool in the form of Web notes. You can use Scala and SQL online to query and analyze data and generate reports. Native support for Spark, Scala, SQL, Shell ,markdown, and more. And it is fully open source, and is still in the Apache incubation stage. It has been used in major companies, such as the United States, Microsoft and so on.

Zeppelin's background data engine can be spark, or you can add a data engine for Zeppelin by implementing more interpreters. Building a Zeppelin locally makes spark easier to use, and it's easy to showcase your work to customers.

Prepare
sudo apt-get update    // update apt
Installing the JDK
sudo apt-getopenjdk-8-jre openjdk-8-jdk
installationHadoopInstall SparkInstall git
sudo apt-get install git
Installing Maven
sudo apt-get install maven
Installing NPM
sudo apt-get install NPM //npm home:/usr/share/npm
Installing PHANTOMJS
Download "phantomjs-1.9.8-linux-x86_64.tar.bz2"
extract to :/usr/local/phantomjs
Installing Apache Zeppelin
    • SOURCE download
Https://github.com/apache/incubator-zeppelinhttp://zeppelin.apache.org/download.html
    • Unzip the installation

Apache Zeppelin officially provides the source package and binary packages that we can download as needed to install the relevant packages.

    1. By downloading Zeppelin's binary package:http://ftp.meisei-u.ac.jp/mirror/apache/dist/incubator/zeppelin/0.5.6-incubating/ Zeppelin-0.5.6-incubating-bin-all.tgz, and then unzip the installation.
      TAR-XZVF zeppelin-0.5. 6-incubating-bin-all.tgz
    2. By compiling the source code to install Apache Zeppelin, I am here to download the latest source code from the Zeppelin git repository to compile.

      $ git clone https:/ /// download latest, unzip to:/usr/local/zeppelin
Compiling Apache Zeppelin
    • Local mode: mvn clean package-dskiptests
    • Cluster mode: MVN package-pspark-2.0-dhadoop.version=2.7.1-phadoop-2.7-dskiptests-x
In the installation process may have a variety of problems, but it is usually caused by network problems, re-execute the following compile command. However, if you are compiling an oom, you need to add the following command:
Export maven_opts="-xmx2g-xx:maxpermsize=512m-xx:reservedcodecachesize=512m"  
  1. Configure environment Variables
     [email protected]:~$ vim. BASHRC 

    //vim edit path
    Export Java_home=/usr/lib/jvm/java-8 -openjdk- Amd64export spark_home =/usr/local/sparkexport hadoop_home =/usr/local/ hadoopexport phantomjs_home =/usr/local/ Phantomjsexport zeppelin_home =/usr/local/zeppelinexport PATH =.: $PATH :/usr/local/hadoop/bin:/usr/local/phantomjs/bin:/usr/local/spark/bin:/usr/local/zeppelin/bin:/usr/lib/jvm/ Java-8 -openjdk-amd64/bin;

    [email protected]:~$ source. BASHRC
  2. Cluster Mode compilation
    [Email protected]:~$ cd/usr/local/zeppelin[email protected]:/usr/local/zeppelin$ mvn package-pspark-  2.0 -dhadoop.version=2.7. 1 -phadoop-2.7 -dskiptests-x
If you need to use yarn, you must specify the-pyarn option when compiling Zeppelin.
Configuration

The configuration file is the environment variable file (conf/zeppelin-env.sh) and the Java Properties File (Conf/zeppelin-site.xml). Configure according to your requirements.

    1. Copy/usr/local/zeppelin/conf/zeppelin-env.sh.template and/usr/local/zeppelin/conf/zeppelin-site.xml.template to/usr/ Local/zeppelin/conf/zeppelin-env.sh and/usr/local/zeppelin/conf/zeppelin-site.xml.
    2. Edit conf/zeppelin-env.sh
      Export java_home=/usr/lib/jvm/java-8-openjdk-amd64export spark_home=/usr/local/sparkexport Hadoop_conf_dir=/usr/local/hadoopexport spark_submit_options="--packages com.databricks:spark-csv_2.10:1.2.0"
Start

Execute the following command in the Zeppelin_home directory:

[Email protected]:/usr/local/zeppelin$./bin/zeppelin-daemon.sh start

Its start/stop command: bin/zeppelin-daemon.sh start/stop.

After startup, open localhost:8080 to access the Zepplin home page.

Test
    • Configuring the Spark Interpreter

    • Create a note

    • Zeppelin Getting Started with

1.text
The text content is output by default in the Scala language:

println ("Hello Yuan siping! ")

2.html

Shell Output HTML:

" %html  "

3.table

Scala:

Print (S"" "%table name\tsize\nsun\t100\nmoon\t10""")

4.Tutorial with Local File

Data Refine:
Download Bank data: http://archive.ics.uci.edu/ml/machine-learning-databases/00222/bank.zip, convert CSV format data to the Bank object Rdd, and filter the header column:

Val Banktext = Sc.textfile ("/usr/data/bank/bank-full.csv") Case classBank (Age:integer, Job:string, marital:string, Education:string, Balance:integer) Val Bank= Banktext.map (S=>s.split (";"). Filter (S=>s (0)!="\ "Age\""). Map (S=>bank (S (0). ToInt, S (1). ReplaceAll ("\"",""), S (2). ReplaceAll ("\"",""), S (3). ReplaceAll ("\"",""), S (5). ReplaceAll ("\"","") . ToInt) bank.todf (). Registertemptable ("Bank")

Data Retrieval:
You can see the age distribution by executing the following statement:

Select Age, Count (1fromwhere the group by age

Dynamic Input MaxAge parameter (default is 30 years old), view age distribution less than maxage years old:

Select Age, Count (1fromwhere < ${maxage=)GROUP by age

Depending on the marital status option, check the age distribution status:

Select Age, Count (1fromwhere marital="${marital=single,single| divorced|married}" GROUP By Age"

Under Ubuntu based SAPRK installation Zeppelin

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.