Spark SQL metadata configuration to MySQL

Last Update:2018-06-22 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

construct a data warehouse with Spark as its core:0. Descriptionin the big Data world, hive is popular as an old data warehouse, and spark can consider compatibility with hive. But if you don't want to do a data warehouse with hive, it's no big deal. We use spark to build the latest data warehouse. The evolution of Sparksql shows that spark itself can be a data warehouse without the need for hive. Sparksql as a data warehouse whose metadata is placed in Derby, the general production environment does not use Derby, but instead usesMySQL or PostgreSQL. This article is about telling the reader how to store sparksql metadata in MySQL.1. Cluster planning situationMySQL chinac244<-->chinac242, this two node does a primary master backup. Spark Master chinac88<-->chinac82, which two nodes did haSpark Slave chinac88,chinac82,chinac272. Configuration files (modified on chinac27, then distributed to the cluster)copy $hive_home/conf/hive-site.xml to $spark_home/conf/hive-site.xml after decompressionedit this file

Vim  $SPARK _home/conf/hive-site.xml

Modify the following content

<property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://chinac244:3306/sparkmetadata?createdatabaseifnotexist=true</value>&LT;DESCRIPTION&GT;JDBC Connectstring  forA JDBC metastore</description> </property> <property> <name> Javax.jdo.option.connectiondrivername</name> <value>com.mysql.jdbc.Driver</value> < Description>driverclassName forA JDBC metastore</description> </property> <property> <name> Javax.jdo.option.connectionusername</name> <value>root</value> <description>username to Use against Metastore database</description> </property> <property> <name>javax.jdo.option.c Onnectionpassword</name> <value>chinac</value> <description>password to use against Metastor E database</description> </property>

3. Modify the time attribute (not done)then modify all the time attributes in the Hive-site.xml, the units of all the attributes are S (s), delete s and then add 3 0, all the properties of the unit for MS Delete Ms,spark cannot recognize these units, but instead they are all treated as numbers. 4. Distributing the configuration file

    SCP $SPARK _home/conf/hive-site.xml chinac82: $SPARK _home/conf/hive-site.xml    SCP $SPARK _home/conf/ Hive-site.xml chinac88: $SPARK _home/conf/hive-site.xml

5. Restart the Spark cluster

[Email protected] ~]#  ${spark_home}/sbin/stop-~]#  nohup ${spark_home}/sbin/start-all.sh &

The start effect is as follows6. Test ConfigurationA, view the database information in MySQLB, execute sparksql command

  [Email protected] conf]# spark-sql--master Spark://  chinac88:7077,chinac82:7077// 1. Create a data table  ' , ' ' \ n ';

This statement will generate the Sparkmetadata database in MySQL, the data tableThis statement will generate the appropriate directory in HDFs7. Further testingprepare the data for the following resultsload the data with the following statement

  ' /root/software/test ' OVERWRITE into TABLE Testspark;

This statement will upload the file to HDFs and view the data as followsQuery the data, you can see the loaded data.

     SELECT * from Testspark;

deleting tables and deleting table information from MySQL and data in HDFs

   DROP TABLE Testspark;

8. At this point, the sparksql metadata is stored in MySQL, and we no longer need the hive Data warehouse. Just use spark to do the data warehouse.

Spark SQL metadata configuration to MySQL

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Spark SQL metadata configuration to MySQL

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Spark SQL metadata configuration to MySQL

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support