Docker-based Spark-hadoop distributed cluster II: Environmental Testing

Source: Internet
Author: User
Tags mysql query hdfs dfs

On the basis of the previous chapter, "Environment Construction", this chapter makes a test for each module.

Mysql Test

1. mysql Node preparation

For easy testing, in MySQL node, add point data

Go to the master node

docker exec -it hadoop-maste /bin/bash

Enter the database node

ssh hadoop-mysql

Create a database

create database zeppelin_test;

Create a data table

create table user_info(id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,name VARCHAR(16),age INT);

Add a few more data, and the primary key makes it self-increment:

insert into user_info(name,age) values("aaa",10);insert into user_info(name,age) values("bbb",20);insert into user_info(name,age) values("ccc",30);

2. Zeppelin Configuration

Configure the driver and URL address:

default.driver  ====>   com.mysql.jdbc.Driverdefault.url     ====>   jdbc:mysql://hadoop-mysql:3306/zeppelin_test

Make Zeppelin Import into Mysql-connector-java library (get in MAVEN warehouse)

mysql:mysql-connector-java:8.0.12

3. Test MySQL Query

%jdbcselect * from user_info;

You should be able to print out a few previously inserted data.

Hive Test

This time using the JDBC test to connect hive, note a key configuration of hive-site.xml in the previous section, to use a JDBC connection (that is, TCP mode), the Hive.server2.transport.mode should be set to binary.

1. Zeppelin Configuration

(1) Add hive interpreter, modify the following configuration in JDBC mode

default.driver  ====>   org.apache.hive.jdbc.HiveDriverdefault.url     ====>   jdbc:hive2://hadoop-hive:10000

(2) Adding dependencies

org.apache.hive:hive-jdbc:0.14.0org.apache.hadoop:hadoop-common:2.6.0

2. Testing

Zeppelin Add a note

Add a DB:

%hiveCREATE SCHEMA user_hive
%hiveuse user_hive

Create a table:

%hivecreate table if not exists user_hive.employee(id int ,name string ,age int)

Insert data:

%hiveinsert into user_hive.employee(id,name,age) values(1,"aaa",10)

Print again:

%hiveselect * from user_hive.employee

All the operations are OK.

In addition, you can remove hive from MYDQL. In the DBS table, look at the meta-information for the database you just created:

%jdbcselect * frmo hive.DBS;

As follows:

Displays the metadata for the db you just created.

Log in to the Hadoop admin background and you should also see the File information (container environment maps the 50070 port of Hadoop to 51070 of the host)

http://localhost:51070/explorer.html#/home/hive/warehouse/user_hive.db

As you can see, under User_hive.db/employee, there is a data file just created, as follows:

Distributed testing

On the basis of the previous section, into the master-slave node, you can see that in the same directory, there is the same data content, the previous section for the operation of Hive, the master-slave node is in effect. The operation is as follows:

Master node:

root@hadoop-maste:~# hdfs dfs -ls  /home/hive/warehouse/user_hive.db/employeeFound 1 items-rwxr-xr-x   2 gpadmin supergroup          9 2018-08-15 11:36 /home/hive/warehouse/user_hive.db/employee/000000_0

From node:

root@hadoop-node1:~# hdfs dfs -ls  /home/hive/warehouse/user_hive.db/employeeFound 1 items-rwxr-xr-x   2 gpadmin supergroup          9 2018-08-15 11:36 /home/hive/warehouse/user_hive.db/employee/000000_0
Test Spark Operation Hive

Write two data from spark to the user_hive.db you just created, as follows:

import org.apache.spark.sql.{SQLContext, Row}import org.apache.spark.sql.types.{StringType, IntegerType, StructField, StructType}import org.apache.spark.sql.hive.HiveContext//import hiveContext.implicits._val hiveCtx = new HiveContext(sc)val employeeRDD = sc.parallelize(Array("6 rc 26","7 gh 27")).map(_.split(" "))val schema = StructType(List(StructField("id", IntegerType, true),StructField("name", StringType, true),StructField("age", IntegerType, true)))val rowRDD = employeeRDD.map(p => Row(p(0).toInt, p(1).trim, p(2).toInt))val employeeDataFrame = hiveCtx.createDataFrame(rowRDD, schema)employeeDataFrame.registerTempTable("tempTable")hiveCtx.sql("insert into user_hive.employee select * from tempTable")

After running, check hive.

%hiveselect * from employee

As you can see, the data is already written into the file.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.