Metastore docking PostgreSQL for Spark SQL

Source: Internet
Author: User
Tags postgresql psql



This tutorial records the Spark 1.3.1 version of Thriftserver Metastore docking PostgreSQL






PostgreSQL compilation, reference: Http://www.cnblogs.com/chenfool/p/4530925.html






One-start PostgreSQL service



1 First, you need to load PostgreSQL lib into the LD_LIBRARY_PATH environment variable

export LD_LIBRARY_PATH = / opt / sequoiadb / lib: $ {LD_LIBRARY_PATH}
I directly added the command here to the ~ / .bash_profile configuration file. Every time the user logs in, this environment variable can be automatically loaded.

2 Create a data directory

mkdir -p database / pg_data
3 Initialize the data directory of pg

bin / initdb -D database / pg_data /
4 Create pg's logs directory

mkdir -p database / pg_data / logs
5 Modify postgresql configuration to allow other users to log in to postgresql remotely

linenum = $ (cat -n database / pg_data / pg_hba.conf | grep "# IPv4 local connections:" | awk '{print $ 1}'); let "linenum = linenum + 1"; varStr = "host all all 0.0. 0.0 / 0 trust "; sed -i" $ {linenum} a $ {varStr} "database / pg_data / pg_hba.conf;
sed -i "s / # listen_addresses =‘ localhost ’/ listen_addresses =‘ 0.0.0.0 ’/ g" database / pg_data / postgresql.conf
6 Start the postgresql service. At startup, the sdbadmin user is used.

bin / postgres -D database / pg_data / >> database / pg_data / logs / logfile 2> & 1 &
 7 Create a database named "foo" for postgresql

bin / createdb -p 5432 foo
8 After logging in to the pg shell, change the password of the sdbadmin user

Login pg shell

bin / psql foo
Execute the following command in the shell to change the password of the sdbadmin user to sdbadmin

alter user sdbadmin with password ‘sdbadmin’;
You can test as root

/ opt / sequoiadb / bin / psql --username = sdbadmin -W foo
After entering the sdbadmin password, you can log in to the pg shell normally.

Continue executing commands in the pg shell to create a database named metastore

CREATE DATABASE metastore
Switch database to metastore

\ c metastore
Execute the sql script in hive. Since my version of hive is 1.2.0, the executed hive script is also 1.2.0 (note: when the user logs in to the pg shell, he needs to switch to $ {HIVE_HOME} / scripts / metastore / upgrade / postgres / directory, otherwise errors will occur)

But I see on the spark official website that spark sql 1.3.1 is currently connected to the version of hive 0.12 0.13, so we may be able to interface with that version of the script (this command is actually to postgresql to create some tables to be used in the future) And setting variables and the like)

\ i /opt/hive/scripts/metastore/upgrade/postgres/hive-schema-1.2.0.postgres.sql
 

 

Two spark thriftserver docking with postgresql

1 Refer to the hive to mysql as a metastore tutorial I wrote before: http://www.cnblogs.com/chenfool/p/3574789.html

We create an empty file named hive-site.xml directly in the conf directory of spark, and then write the configuration information to the hive-site.xml file

<configuration>
   <property>
     <name> hive.metastore.schema.verification </ name>
     <value> false </ value>
   </ property>
   <property>
      <name> javax.jdo.option.ConnectionURL </ name>
      <value> jdbc: postgresql: // ubuntu2: 5432 / metastore </ value>
      <description> JDBC connect string for a JDBC metastore </ description>
   </ property>
   <property>
      <name> javax.jdo.option.ConnectionDriverName </ name>
      <value> org.postgresql.Driver </ value>
      <description> Driver class name for a JDBC metastore </ description>
   </ property>
   <property>
      <name> javax.jdo.option.ConnectionUserName </ name>
      <value> sdbadmin </ value>
   </ property>
   <property>
      <name> javax.jdo.option.ConnectionPassword </ name>
      <value> sdbadmin </ value>
   </ property>
   <property>
      <name> datanucleus.autoCreateSchema </ name>
      <value> false </ value>
      <description> creates necessary schema on a startup if one does n’t exist. set this to false, after creating it once </ description>
   </ property>
</ configuration>
javax.jdo.option.ConnectionURL Fill in postgresql service address, port number and database
javax.jdo.option.ConnectionUserName Fill in the postgresql username
javax.jdo.option.ConnectionPassword Fill in the user password
2 Download jdbc driver for postgresql

Download link https://jdbc.postgresql.org/download/postgresql-9.3-1103.jdbc41.jar
3 Put the jar package under spark's lib and modify the spark-env.sh file

mv postgresql-9.3-1103.jdbc41.jar /opt/spark-1.3.1-bin-hadoop2.6/lib/
Modify the spark-env.sh file. Since I have docked the connection between SequoiaDB and Spark sql, there are sequoiadb related jar packages in spark_classpath

SPARK_CLASSPATH = "/ opt / sequoiadb / java / sequoiadb.jar: /opt/sequoiadb/spark/spark-sequoiadb_2.10-1.12.jar: /opt/sequoiadb/hadoop/hadoop-connector-2.2.jar: / opt / spark -1.3.1-bin-hadoop2.6 / lib / postgresql-9.3-1103.jdbc41.jar "
4 Set the CLASSPATH and add the postgresql jdbc driver path

export CLASSPATH = / opt / postgresql-9.3-1103.jdbc4.jar: $ {CLASSPATH}
If not set, the following error will be reported when starting thriftserver

Attempt to invoke the "dbcp-builtin" plugin to create a ConnectionPool gave an error: The specified datastore driver ("org.postgresql.Driver") was not found in the CLASSPATH. Please check your CLASSPATH specification, and the name of the driver .

 5 Start the spark thriftserver service

/opt/spark-1.3.1-bin-hadoop2.6/sbin/start-thriftserver.sh --master spark: // ubuntu1: 7800 --hiveconf "hive.metastore.warehouse.dir = hdfs: // ubuntu1: 9000 / user / hive / warehouse "
Netstat command should be able to see port 10000 is started

And you can see in logs / spark-root-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-ubuntu2.out

INFO ThriftCLIService: ThriftBinaryCLIService listening on 0.0.0.0/0.0.0.0:10000 proves that the startup was successful

 

How to operate thriftserver can refer to my previous blog:

http://www.cnblogs.com/chenfool/p/4502212.html

 

Reference article:

http://www.cloudera.com/content/cloudera/en/documentation/cdh4/v4-2-0/CDH4-Installation-Guide/cdh4ig_topic_18_4.html

http://docs.hortonworks.com/HDPDocuments/Ambari-1.6.1.0/bk_ambari_reference/content/nndb-using-hive-postresql.html

http://www.sequoiadb.com/cn/index.php?p=whitepaper_7

 

spark sql metastore docking postgresql

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.