construct a data warehouse with Spark as its core:0. Descriptionin the big Data world, hive is popular as an old data warehouse, and spark can consider compatibility with hive. But if you don't want to do a data warehouse with hive, it's no big deal. We use spark to build the latest data warehouse. The evolution of Sparksql shows that spark itself can be a data warehouse without the need for hive. Sparksql as a data warehouse whose metadata is placed in Derby, the general production environment does not use Derby, but instead usesMySQL or PostgreSQL. This article is about telling the reader how to store sparksql metadata in MySQL.1. Cluster planning situationMySQL chinac244<-->chinac242, this two node does a primary master backup. Spark Master chinac88<-->chinac82, which two nodes did haSpark Slave chinac88,chinac82,chinac272. Configuration files (modified on chinac27, then distributed to the cluster)copy $hive_home/conf/hive-site.xml to $spark_home/conf/hive-site.xml after decompressionedit this file
Vim $SPARK _home/conf/hive-site.xml
Modify the following content
<property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://chinac244:3306/sparkmetadata?createdatabaseifnotexist=true</value><DESCRIPTION>JDBC Connectstring forA JDBC metastore</description> </property> <property> <name> Javax.jdo.option.connectiondrivername</name> <value>com.mysql.jdbc.Driver</value> < Description>driverclassName forA JDBC metastore</description> </property> <property> <name> Javax.jdo.option.connectionusername</name> <value>root</value> <description>username to Use against Metastore database</description> </property> <property> <name>javax.jdo.option.c Onnectionpassword</name> <value>chinac</value> <description>password to use against Metastor E database</description> </property>
3. Modify the time attribute (not done)then modify all the time attributes in the Hive-site.xml, the units of all the attributes are S (s), delete s and then add 3 0, all the properties of the unit for MS Delete Ms,spark cannot recognize these units, but instead they are all treated as numbers. 4. Distributing the configuration file
SCP $SPARK _home/conf/hive-site.xml chinac82: $SPARK _home/conf/hive-site.xml SCP $SPARK _home/conf/ Hive-site.xml chinac88: $SPARK _home/conf/hive-site.xml
5. Restart the Spark cluster
[Email protected] ~]# ${spark_home}/sbin/stop-~]# nohup ${spark_home}/sbin/start-all.sh &
The start effect is as follows6. Test ConfigurationA, view the database information in MySQLB, execute sparksql command
[Email protected] conf]# spark-sql--master Spark:// chinac88:7077,chinac82:7077// 1. Create a data table ' , ' ' \ n ';
This statement will generate the Sparkmetadata database in MySQL, the data tableThis statement will generate the appropriate directory in HDFs7. Further testingprepare the data for the following resultsload the data with the following statement
' /root/software/test ' OVERWRITE into TABLE Testspark;
This statement will upload the file to HDFs and view the data as followsQuery the data, you can see the loaded data.
SELECT * from Testspark;
deleting tables and deleting table information from MySQL and data
in HDFs
DROP TABLE Testspark;
8. At this point, the sparksql metadata is stored in MySQL, and we no longer need the hive Data warehouse. Just use spark to do the data warehouse.
Spark SQL metadata configuration to MySQL