Spark SQL metadata configuration to MySQL

Source: Internet
Author: User

construct a data warehouse with Spark as its core:0. Descriptionin the big Data world, hive is popular as an old data warehouse, and spark can consider compatibility with hive. But if you don't want to do a data warehouse with hive, it's no big deal. We use spark to build the latest data warehouse. The evolution of Sparksql shows that spark itself can be a data warehouse without the need for hive. Sparksql as a data warehouse whose metadata is placed in Derby, the general production environment does not use Derby, but instead usesMySQL or PostgreSQL. This article is about telling the reader how to store sparksql metadata in MySQL.1. Cluster planning situationMySQL chinac244<-->chinac242, this two node does a primary master backup. Spark Master chinac88<-->chinac82, which two nodes did haSpark Slave chinac88,chinac82,chinac272. Configuration files (modified on chinac27, then distributed to the cluster)copy $hive_home/conf/hive-site.xml to $spark_home/conf/hive-site.xml after decompressionedit this file
Vim  $SPARK _home/conf/hive-site.xml
Modify the following content
<property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://chinac244:3306/sparkmetadata?createdatabaseifnotexist=true</value>&LT;DESCRIPTION&GT;JDBC Connectstring  forA JDBC metastore</description> </property> <property> <name> Javax.jdo.option.connectiondrivername</name> <value>com.mysql.jdbc.Driver</value> < Description>driverclassName forA JDBC metastore</description> </property> <property> <name> Javax.jdo.option.connectionusername</name> <value>root</value> <description>username to Use against Metastore database</description> </property> <property> <name>javax.jdo.option.c Onnectionpassword</name> <value>chinac</value> <description>password to use against Metastor E database</description> </property>
3. Modify the time attribute (not done)then modify all the time attributes in the Hive-site.xml, the units of all the attributes are S (s), delete s and then add 3 0, all the properties of the unit for MS Delete Ms,spark cannot recognize these units, but instead they are all treated as numbers. 4. Distributing the configuration file
    SCP $SPARK _home/conf/hive-site.xml chinac82: $SPARK _home/conf/hive-site.xml    SCP $SPARK _home/conf/ Hive-site.xml chinac88: $SPARK _home/conf/hive-site.xml

5. Restart the Spark cluster

[Email protected] ~]#  ${spark_home}/sbin/stop-~]#  nohup ${spark_home}/sbin/start-all.sh &
The start effect is as follows6. Test ConfigurationA, view the database information in MySQLB, execute sparksql command
  [Email protected] conf]# spark-sql--master Spark://  chinac88:7077,chinac82:7077// 1. Create a data table  ' , ' ' \ n ';
This statement will generate the Sparkmetadata database in MySQL, the data tableThis statement will generate the appropriate directory in HDFs7. Further testingprepare the data for the following resultsload the data with the following statement
  ' /root/software/test ' OVERWRITE into TABLE Testspark;
This statement will upload the file to HDFs and view the data as followsQuery the data, you can see the loaded data.
     SELECT * from Testspark;
deleting tables and deleting table information from MySQL and data in HDFs
   DROP TABLE Testspark;
8. At this point, the sparksql metadata is stored in MySQL, and we no longer need the hive Data warehouse. Just use spark to do the data warehouse.

Spark SQL metadata configuration to MySQL

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.