Metastore docking PostgreSQL for Spark SQL

Last Update:2015-05-27 Source: Internet

Author: User

Tags postgresql psql

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This tutorial records the Spark 1.3.1 version of Thriftserver Metastore docking PostgreSQL

PostgreSQL compilation, reference: Http://www.cnblogs.com/chenfool/p/4530925.html

One-start PostgreSQL service

1 First, you need to load PostgreSQL lib into the LD_LIBRARY_PATH environment variable

export LD_LIBRARY_PATH = / opt / sequoiadb / lib: $ {LD_LIBRARY_PATH}
I directly added the command here to the ~ / .bash_profile configuration file. Every time the user logs in, this environment variable can be automatically loaded.

2 Create a data directory

mkdir -p database / pg_data
3 Initialize the data directory of pg

bin / initdb -D database / pg_data /
4 Create pg's logs directory

mkdir -p database / pg_data / logs
5 Modify postgresql configuration to allow other users to log in to postgresql remotely

linenum = $ (cat -n database / pg_data / pg_hba.conf | grep "# IPv4 local connections:" | awk '{print $ 1}'); let "linenum = linenum + 1"; varStr = "host all all 0.0. 0.0 / 0 trust "; sed -i" $ {linenum} a $ {varStr} "database / pg_data / pg_hba.conf;
sed -i "s / # listen_addresses =‘ localhost ’/ listen_addresses =‘ 0.0.0.0 ’/ g" database / pg_data / postgresql.conf
6 Start the postgresql service. At startup, the sdbadmin user is used.

bin / postgres -D database / pg_data / >> database / pg_data / logs / logfile 2> & 1 &
7 Create a database named "foo" for postgresql

bin / createdb -p 5432 foo
8 After logging in to the pg shell, change the password of the sdbadmin user

Login pg shell

bin / psql foo
Execute the following command in the shell to change the password of the sdbadmin user to sdbadmin

alter user sdbadmin with password ‘sdbadmin’;
You can test as root

/ opt / sequoiadb / bin / psql --username = sdbadmin -W foo
After entering the sdbadmin password, you can log in to the pg shell normally.

Continue executing commands in the pg shell to create a database named metastore

CREATE DATABASE metastore
Switch database to metastore

\ c metastore
Execute the sql script in hive. Since my version of hive is 1.2.0, the executed hive script is also 1.2.0 (note: when the user logs in to the pg shell, he needs to switch to $ {HIVE_HOME} / scripts / metastore / upgrade / postgres / directory, otherwise errors will occur)

But I see on the spark official website that spark sql 1.3.1 is currently connected to the version of hive 0.12 0.13, so we may be able to interface with that version of the script (this command is actually to postgresql to create some tables to be used in the future) And setting variables and the like)

\ i /opt/hive/scripts/metastore/upgrade/postgres/hive-schema-1.2.0.postgres.sql

Two spark thriftserver docking with postgresql

1 Refer to the hive to mysql as a metastore tutorial I wrote before: http://www.cnblogs.com/chenfool/p/3574789.html

We create an empty file named hive-site.xml directly in the conf directory of spark, and then write the configuration information to the hive-site.xml file

<configuration>
<property>
<name> hive.metastore.schema.verification </ name>
<value> false </ value>
</ property>
<property>
<name> javax.jdo.option.ConnectionURL </ name>
<value> jdbc: postgresql: // ubuntu2: 5432 / metastore </ value>
<description> JDBC connect string for a JDBC metastore </ description>
</ property>
<property>
<name> javax.jdo.option.ConnectionDriverName </ name>
<value> org.postgresql.Driver </ value>
<description> Driver class name for a JDBC metastore </ description>
</ property>
<property>
<name> javax.jdo.option.ConnectionUserName </ name>
<value> sdbadmin </ value>
</ property>
<property>
<name> javax.jdo.option.ConnectionPassword </ name>
<value> sdbadmin </ value>
</ property>
<property>
<name> datanucleus.autoCreateSchema </ name>
<value> false </ value>
<description> creates necessary schema on a startup if one does n’t exist. set this to false, after creating it once </ description>
</ property>
</ configuration>
javax.jdo.option.ConnectionURL Fill in postgresql service address, port number and database
javax.jdo.option.ConnectionUserName Fill in the postgresql username
javax.jdo.option.ConnectionPassword Fill in the user password
2 Download jdbc driver for postgresql

Download link https://jdbc.postgresql.org/download/postgresql-9.3-1103.jdbc41.jar
3 Put the jar package under spark's lib and modify the spark-env.sh file

mv postgresql-9.3-1103.jdbc41.jar /opt/spark-1.3.1-bin-hadoop2.6/lib/
Modify the spark-env.sh file. Since I have docked the connection between SequoiaDB and Spark sql, there are sequoiadb related jar packages in spark_classpath

SPARK_CLASSPATH = "/ opt / sequoiadb / java / sequoiadb.jar: /opt/sequoiadb/spark/spark-sequoiadb_2.10-1.12.jar: /opt/sequoiadb/hadoop/hadoop-connector-2.2.jar: / opt / spark -1.3.1-bin-hadoop2.6 / lib / postgresql-9.3-1103.jdbc41.jar "
4 Set the CLASSPATH and add the postgresql jdbc driver path

export CLASSPATH = / opt / postgresql-9.3-1103.jdbc4.jar: $ {CLASSPATH}
If not set, the following error will be reported when starting thriftserver

Attempt to invoke the "dbcp-builtin" plugin to create a ConnectionPool gave an error: The specified datastore driver ("org.postgresql.Driver") was not found in the CLASSPATH. Please check your CLASSPATH specification, and the name of the driver .

5 Start the spark thriftserver service

/opt/spark-1.3.1-bin-hadoop2.6/sbin/start-thriftserver.sh --master spark: // ubuntu1: 7800 --hiveconf "hive.metastore.warehouse.dir = hdfs: // ubuntu1: 9000 / user / hive / warehouse "
Netstat command should be able to see port 10000 is started

And you can see in logs / spark-root-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-ubuntu2.out

INFO ThriftCLIService: ThriftBinaryCLIService listening on 0.0.0.0/0.0.0.0:10000 proves that the startup was successful

How to operate thriftserver can refer to my previous blog:

http://www.cnblogs.com/chenfool/p/4502212.html

Reference article:

http://www.cloudera.com/content/cloudera/en/documentation/cdh4/v4-2-0/CDH4-Installation-Guide/cdh4ig_topic_18_4.html

http://docs.hortonworks.com/HDPDocuments/Ambari-1.6.1.0/bk_ambari_reference/content/nndb-using-hive-postresql.html

http://www.sequoiadb.com/cn/index.php?p=whitepaper_7

spark sql metastore docking postgresql

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More