sparkSQL1.1: Setting up the test environment

Last Update:2014-09-10 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The operating architecture of Sparksql is described earlier, and the use of Sparksql is described later. Before we introduce the use of sparksql, we need to build a sparksql test environment. This test environment involves Hadoop's HDFs, Hive, Spark, and related data files, with information such as:

Hadoop version is 2.2.0
Hive version is 0.13
Spark version is 1.1.0-RC3
MySQL version is 5.6.12
Test data Download location: Http://pan.baidu.com/s/1eQCbT30#path=%252Fblog in Sparksql_data.zip

Test environment:
This test environment is built on a physical machine, the physical machine is configured to 16G memory, 4 core 8 thread CPU. HADOOP1, HADOOP2, hadoop3 are vitual box VMS, build Hadoop clusters and spark clusters, physical machine Wyy as a client, write code and submit compute tasks. The overall test environment is configured as follows:

Machine name	Configuration	role	Software Installation
HADOOP1	4g memory, 1 cores	Hadoop:nn/dn Spark:master/worker	/app/hadoop/hadoop220/app/hadoop/spark110/app/scala2104/usr/java/jdk1.7.0_21
HADOOP2	4G Memory, 1 cores	Hadoop:dn Spark:worker hive0.13 Client	/app/hadoop/hadoop220/app/hadoop/spark110/app/hadoop/hive013/app/scala2104/usr/java/jdk1.7.0_21
HADOOP3	4G Memory, 1 cores	Hadoop:dn Spark:worker hive0.13 Metaserver Service MySQL server	/app/hadoop/hadoop220/app/hadoop/spark100/app/hadoop/hive013/app/scala2104/usr/java/jdk1.7.0_ 21mysql5.6.12
Wyy	16G Memory, 4 cores	Client hive0.13 Client	/app/hadoop/hadoop220 /app/hadoop/spark110 /app/hadoop /hive013

The user attributes for the above hadoop220, Spark, and hive installation directories are Hadoop (group Hadoop), and the user properties of the other installation directories are root:root.
Test environment Build Order 1: Virtual cluster construction (HADOOP1, HADOOP2, HADOOP3) a:hadoop2.2.0 cluster build Reference blog hadoop2.2.0 test environment build or see video http://pan.baidu.com/s/ 1qwqfy4c Extract Password: xv4i
B:mysql Installation Reference blog mysql5.6.12 for Linux installation
C:hive Installation Reference blog Hive 0.11.0 Remote mode builds the hive0.13 used in this test, just like the hive0.11 installation. Hive is installed in Hadoop3, HADOOP2, Wyy. Where HADOOP3 starts Metastore serive;hadoop2, Wyy configures URIs after the client as hive.
d:spark1.1.0 Standalone Cluster Construction Reference blog Spark1.0.0 on Standalone mode deployment It is important to note that In this test, spark1.1.0 is used, the parameters of the deployment Package Generation command make-distribution.sh have changed, and the spark1.1.0 make-distribution.sh uses the format:

./make-distribution.sh [--name] [--tgz] [--with-tachyon] <maven build options>

The meaning of the parameter:--with-tachyon: Whether the memory file system Tachyon is supported, is not supported when this parameter is not added. --TGZ: Generate spark-$VERSION-bin.tar.gz in the root directory, do not add this parameter is not generated tgz file, only generate/dist directory. --name name: Combined with-tgz, you can generate spark-$VERSION-bin-$NAME. TGZ deployment package, when this parameter is not added, name is the version number of Hadoop. MAVEN Build Options: Configurations that you can use when using Maven compile, such as the option to use-p,-D, this minor build is based on hadoop2.2.0 and yarn and integrates hive, ganglia, ASL spark1.1.0 Deployment package, you can use the command:

./make-distribution.sh--tgz--name 2.2.0-pyarn-phadoop-2.2-pspark-ganglia-lgpl-pkinesis-asl-phive

Finally, the deployment package spark-1.1.0-bin-2.2.0.tgz is generated and installed according to the planning of the test environment.
2: Client Building client WYY uses the Ubuntu operating system, and the spark virtual cluster uses CentOS, the default Java installation directory two operating systems are not the same, So when installing Java under Ubuntu, we deliberately changed the Java installation path to CentOS. Otherwise, after each SCP of the virtual cluster's configuration file, modify the Java_home in the Hadoop, spark Run configuration file. The client hadoop2.2.0, Spark1.1.0, hive0.13 are directly from the SCP in the virtual cluster, placed in the same directory, with the same user attributes. The development tool uses the IntelliJ idea, which is compiled and packaged and copied to spark1.1.0 's root directory/app/hadoop/spark110, using Spark-submit to commit the virtual machine cluster to run.
3: File Data preparation start hadoop2.2.0 (only HDFs boot is required), then upload the data file to the corresponding directory: 4:hive data preparation in hive, define a database saledata, and three tables tbldate, Tblstock , Tblstockdetail, and load data, specific commands:

The CREATE DATABASE saledata;use saledata;//date.txt file defines the date classification that will be assigned each day to the respective month, week, quarter, etc. attributes//date, year, month, Day, week, week, quarter, Half month create TABLE tbldate (DateID string,theyearmonth string,theyear string,themonth string,thedate string,theweek String, Theweeks string,thequot string,thetenday string,thehalfmonth string) ROW FORMAT delimited fields TERMINATED by ', ' LINES T erminated by ' \ n ';//stock.txt file defines the order header/order number, trading location, trading date CREATE TABLE Tblstock (OrderNumber String,locationid string, DateID string) row FORMAT delimited fields TERMINATED by ', ' LINES TERMINATED by ' \ n ';//stockdetail.txt file defines the Order details//order number, line number, Goods, quantity, amount create TABLE tblstockdetail (ordernumber string,rownum int,itemid string,qty int,price int,amount int) ROW FORMAT Delimited fields TERMINATED by ', ' LINES TERMINATED by ' \ n ';//Load Data load/LOCAL Inpath '/home/mmicky/mboo/myclass/doc/s Parksql/data/date.txt ' into TABLE tbldate; LOAD DATA LOCAL inpath '/home/mmicky/mboo/myclass/doc/sparksql/data/stock.txt ' into TABLE tblstock; LOAD DATA LOCAL Inpath '/home/mmickY/mboo/myclass/doc/sparksql/data/stockdetail.txt ' into TABLE tblstockdetail;

Finally in HDFs you can see the relevant data: 5: Start enjoying the Sparksql tour ...
The spark big Data Fast computing platform (phase III) will be available recently in the into gold, this material is the new lesson material.

sparkSQL1.1: Setting up the test environment

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

sparkSQL1.1: Setting up the test environment

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

sparkSQL1.1: Setting up the test environment

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support