On the basis of the previous chapter, "Environment Construction", this chapter makes a test for each module.
Mysql Test
1. mysql Node preparation
For easy testing, in MySQL node, add point data
Go to the master node
docker exec -it hadoop-maste /bin/bash
Enter the database node
ssh hadoop-mysql
Create a database
create database zeppelin_test;
Create a data table
create table user_info(id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,name VARCHAR(16),age INT);
Add a few more data, and the primary key makes it self-increment:
insert into user_info(name,age) values("aaa",10);insert into user_info(name,age) values("bbb",20);insert into user_info(name,age) values("ccc",30);
2. Zeppelin Configuration
Configure the driver and URL address:
default.driver ====> com.mysql.jdbc.Driverdefault.url ====> jdbc:mysql://hadoop-mysql:3306/zeppelin_test
Make Zeppelin Import into Mysql-connector-java library (get in MAVEN warehouse)
mysql:mysql-connector-java:8.0.12
3. Test MySQL Query
%jdbcselect * from user_info;
You should be able to print out a few previously inserted data.
Hive Test
This time using the JDBC test to connect hive, note a key configuration of hive-site.xml in the previous section, to use a JDBC connection (that is, TCP mode), the Hive.server2.transport.mode should be set to binary.
1. Zeppelin Configuration
(1) Add hive interpreter, modify the following configuration in JDBC mode
default.driver ====> org.apache.hive.jdbc.HiveDriverdefault.url ====> jdbc:hive2://hadoop-hive:10000
(2) Adding dependencies
org.apache.hive:hive-jdbc:0.14.0org.apache.hadoop:hadoop-common:2.6.0
2. Testing
Zeppelin Add a note
Add a DB:
%hiveCREATE SCHEMA user_hive
%hiveuse user_hive
Create a table:
%hivecreate table if not exists user_hive.employee(id int ,name string ,age int)
Insert data:
%hiveinsert into user_hive.employee(id,name,age) values(1,"aaa",10)
Print again:
%hiveselect * from user_hive.employee
All the operations are OK.
In addition, you can remove hive from MYDQL. In the DBS table, look at the meta-information for the database you just created:
%jdbcselect * frmo hive.DBS;
As follows:
Displays the metadata for the db you just created.
Log in to the Hadoop admin background and you should also see the File information (container environment maps the 50070 port of Hadoop to 51070 of the host)
http://localhost:51070/explorer.html#/home/hive/warehouse/user_hive.db
As you can see, under User_hive.db/employee, there is a data file just created, as follows:
Distributed testing
On the basis of the previous section, into the master-slave node, you can see that in the same directory, there is the same data content, the previous section for the operation of Hive, the master-slave node is in effect. The operation is as follows:
Master node:
root@hadoop-maste:~# hdfs dfs -ls /home/hive/warehouse/user_hive.db/employeeFound 1 items-rwxr-xr-x 2 gpadmin supergroup 9 2018-08-15 11:36 /home/hive/warehouse/user_hive.db/employee/000000_0
From node:
root@hadoop-node1:~# hdfs dfs -ls /home/hive/warehouse/user_hive.db/employeeFound 1 items-rwxr-xr-x 2 gpadmin supergroup 9 2018-08-15 11:36 /home/hive/warehouse/user_hive.db/employee/000000_0
Test Spark Operation Hive
Write two data from spark to the user_hive.db you just created, as follows:
import org.apache.spark.sql.{SQLContext, Row}import org.apache.spark.sql.types.{StringType, IntegerType, StructField, StructType}import org.apache.spark.sql.hive.HiveContext//import hiveContext.implicits._val hiveCtx = new HiveContext(sc)val employeeRDD = sc.parallelize(Array("6 rc 26","7 gh 27")).map(_.split(" "))val schema = StructType(List(StructField("id", IntegerType, true),StructField("name", StringType, true),StructField("age", IntegerType, true)))val rowRDD = employeeRDD.map(p => Row(p(0).toInt, p(1).trim, p(2).toInt))val employeeDataFrame = hiveCtx.createDataFrame(rowRDD, schema)employeeDataFrame.registerTempTable("tempTable")hiveCtx.sql("insert into user_hive.employee select * from tempTable")
After running, check hive.
%hiveselect * from employee
As you can see, the data is already written into the file.