Docker-based Spark-hadoop distributed cluster II: Environmental Testing

Last Update:2018-08-16 Source: Internet

Author: User

Tags mysql query hdfs dfs

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

On the basis of the previous chapter, "Environment Construction", this chapter makes a test for each module.

Mysql Test

1. mysql Node preparation

For easy testing, in MySQL node, add point data

Go to the master node

docker exec -it hadoop-maste /bin/bash

Enter the database node

ssh hadoop-mysql

Create a database

create database zeppelin_test;

Create a data table

create table user_info(id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,name VARCHAR(16),age INT);

Add a few more data, and the primary key makes it self-increment:

insert into user_info(name,age) values("aaa",10);insert into user_info(name,age) values("bbb",20);insert into user_info(name,age) values("ccc",30);

2. Zeppelin Configuration

Configure the driver and URL address:

default.driver  ====>   com.mysql.jdbc.Driverdefault.url     ====>   jdbc:mysql://hadoop-mysql:3306/zeppelin_test

Make Zeppelin Import into Mysql-connector-java library (get in MAVEN warehouse)

mysql:mysql-connector-java:8.0.12

3. Test MySQL Query

%jdbcselect * from user_info;

You should be able to print out a few previously inserted data.

Hive Test

This time using the JDBC test to connect hive, note a key configuration of hive-site.xml in the previous section, to use a JDBC connection (that is, TCP mode), the Hive.server2.transport.mode should be set to binary.

1. Zeppelin Configuration

(1) Add hive interpreter, modify the following configuration in JDBC mode

default.driver  ====>   org.apache.hive.jdbc.HiveDriverdefault.url     ====>   jdbc:hive2://hadoop-hive:10000

(2) Adding dependencies

org.apache.hive:hive-jdbc:0.14.0org.apache.hadoop:hadoop-common:2.6.0

2. Testing

Zeppelin Add a note

Add a DB:

%hiveCREATE SCHEMA user_hive

%hiveuse user_hive

Create a table:

%hivecreate table if not exists user_hive.employee(id int ,name string ,age int)

Insert data:

%hiveinsert into user_hive.employee(id,name,age) values(1,"aaa",10)

Print again:

%hiveselect * from user_hive.employee

All the operations are OK.

In addition, you can remove hive from MYDQL. In the DBS table, look at the meta-information for the database you just created:

%jdbcselect * frmo hive.DBS;

As follows:

Displays the metadata for the db you just created.

Log in to the Hadoop admin background and you should also see the File information (container environment maps the 50070 port of Hadoop to 51070 of the host)

http://localhost:51070/explorer.html#/home/hive/warehouse/user_hive.db

As you can see, under User_hive.db/employee, there is a data file just created, as follows:

Distributed testing

On the basis of the previous section, into the master-slave node, you can see that in the same directory, there is the same data content, the previous section for the operation of Hive, the master-slave node is in effect. The operation is as follows:

Master node:

root@hadoop-maste:~# hdfs dfs -ls  /home/hive/warehouse/user_hive.db/employeeFound 1 items-rwxr-xr-x   2 gpadmin supergroup          9 2018-08-15 11:36 /home/hive/warehouse/user_hive.db/employee/000000_0

From node:

root@hadoop-node1:~# hdfs dfs -ls  /home/hive/warehouse/user_hive.db/employeeFound 1 items-rwxr-xr-x   2 gpadmin supergroup          9 2018-08-15 11:36 /home/hive/warehouse/user_hive.db/employee/000000_0

Test Spark Operation Hive

Write two data from spark to the user_hive.db you just created, as follows:

import org.apache.spark.sql.{SQLContext, Row}import org.apache.spark.sql.types.{StringType, IntegerType, StructField, StructType}import org.apache.spark.sql.hive.HiveContext//import hiveContext.implicits._val hiveCtx = new HiveContext(sc)val employeeRDD = sc.parallelize(Array("6 rc 26","7 gh 27")).map(_.split(" "))val schema = StructType(List(StructField("id", IntegerType, true),StructField("name", StringType, true),StructField("age", IntegerType, true)))val rowRDD = employeeRDD.map(p => Row(p(0).toInt, p(1).trim, p(2).toInt))val employeeDataFrame = hiveCtx.createDataFrame(rowRDD, schema)employeeDataFrame.registerTempTable("tempTable")hiveCtx.sql("insert into user_hive.employee select * from tempTable")

After running, check hive.

%hiveselect * from employee

As you can see, the data is already written into the file.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More