This tutorial records the Spark 1.3.1 version of Thriftserver Metastore docking PostgreSQL
PostgreSQL compilation, reference: Http://www.cnblogs.com/chenfool/p/4530925.html
One-start PostgreSQL service
1 First, you need to load PostgreSQL lib into the LD_LIBRARY_PATH environment variable
export LD_LIBRARY_PATH = / opt / sequoiadb / lib: $ {LD_LIBRARY_PATH}
I directly added the command here to the ~ / .bash_profile configuration file. Every time the user logs in, this environment variable can be automatically loaded.
2 Create a data directory
mkdir -p database / pg_data
3 Initialize the data directory of pg
bin / initdb -D database / pg_data /
4 Create pg's logs directory
mkdir -p database / pg_data / logs
5 Modify postgresql configuration to allow other users to log in to postgresql remotely
linenum = $ (cat -n database / pg_data / pg_hba.conf | grep "# IPv4 local connections:" | awk '{print $ 1}'); let "linenum = linenum + 1"; varStr = "host all all 0.0. 0.0 / 0 trust "; sed -i" $ {linenum} a $ {varStr} "database / pg_data / pg_hba.conf;
sed -i "s / # listen_addresses =‘ localhost ’/ listen_addresses =‘ 0.0.0.0 ’/ g" database / pg_data / postgresql.conf
6 Start the postgresql service. At startup, the sdbadmin user is used.
bin / postgres -D database / pg_data / >> database / pg_data / logs / logfile 2> & 1 &
7 Create a database named "foo" for postgresql
bin / createdb -p 5432 foo
8 After logging in to the pg shell, change the password of the sdbadmin user
Login pg shell
bin / psql foo
Execute the following command in the shell to change the password of the sdbadmin user to sdbadmin
alter user sdbadmin with password ‘sdbadmin’;
You can test as root
/ opt / sequoiadb / bin / psql --username = sdbadmin -W foo
After entering the sdbadmin password, you can log in to the pg shell normally.
Continue executing commands in the pg shell to create a database named metastore
CREATE DATABASE metastore
Switch database to metastore
\ c metastore
Execute the sql script in hive. Since my version of hive is 1.2.0, the executed hive script is also 1.2.0 (note: when the user logs in to the pg shell, he needs to switch to $ {HIVE_HOME} / scripts / metastore / upgrade / postgres / directory, otherwise errors will occur)
But I see on the spark official website that spark sql 1.3.1 is currently connected to the version of hive 0.12 0.13, so we may be able to interface with that version of the script (this command is actually to postgresql to create some tables to be used in the future) And setting variables and the like)
\ i /opt/hive/scripts/metastore/upgrade/postgres/hive-schema-1.2.0.postgres.sql
Two spark thriftserver docking with postgresql
1 Refer to the hive to mysql as a metastore tutorial I wrote before: http://www.cnblogs.com/chenfool/p/3574789.html
We create an empty file named hive-site.xml directly in the conf directory of spark, and then write the configuration information to the hive-site.xml file
<configuration>
<property>
<name> hive.metastore.schema.verification </ name>
<value> false </ value>
</ property>
<property>
<name> javax.jdo.option.ConnectionURL </ name>
<value> jdbc: postgresql: // ubuntu2: 5432 / metastore </ value>
<description> JDBC connect string for a JDBC metastore </ description>
</ property>
<property>
<name> javax.jdo.option.ConnectionDriverName </ name>
<value> org.postgresql.Driver </ value>
<description> Driver class name for a JDBC metastore </ description>
</ property>
<property>
<name> javax.jdo.option.ConnectionUserName </ name>
<value> sdbadmin </ value>
</ property>
<property>
<name> javax.jdo.option.ConnectionPassword </ name>
<value> sdbadmin </ value>
</ property>
<property>
<name> datanucleus.autoCreateSchema </ name>
<value> false </ value>
<description> creates necessary schema on a startup if one does n’t exist. set this to false, after creating it once </ description>
</ property>
</ configuration>
javax.jdo.option.ConnectionURL Fill in postgresql service address, port number and database
javax.jdo.option.ConnectionUserName Fill in the postgresql username
javax.jdo.option.ConnectionPassword Fill in the user password
2 Download jdbc driver for postgresql
Download link https://jdbc.postgresql.org/download/postgresql-9.3-1103.jdbc41.jar
3 Put the jar package under spark's lib and modify the spark-env.sh file
mv postgresql-9.3-1103.jdbc41.jar /opt/spark-1.3.1-bin-hadoop2.6/lib/
Modify the spark-env.sh file. Since I have docked the connection between SequoiaDB and Spark sql, there are sequoiadb related jar packages in spark_classpath
SPARK_CLASSPATH = "/ opt / sequoiadb / java / sequoiadb.jar: /opt/sequoiadb/spark/spark-sequoiadb_2.10-1.12.jar: /opt/sequoiadb/hadoop/hadoop-connector-2.2.jar: / opt / spark -1.3.1-bin-hadoop2.6 / lib / postgresql-9.3-1103.jdbc41.jar "
4 Set the CLASSPATH and add the postgresql jdbc driver path
export CLASSPATH = / opt / postgresql-9.3-1103.jdbc4.jar: $ {CLASSPATH}
If not set, the following error will be reported when starting thriftserver
Attempt to invoke the "dbcp-builtin" plugin to create a ConnectionPool gave an error: The specified datastore driver ("org.postgresql.Driver") was not found in the CLASSPATH. Please check your CLASSPATH specification, and the name of the driver .
5 Start the spark thriftserver service
/opt/spark-1.3.1-bin-hadoop2.6/sbin/start-thriftserver.sh --master spark: // ubuntu1: 7800 --hiveconf "hive.metastore.warehouse.dir = hdfs: // ubuntu1: 9000 / user / hive / warehouse "
Netstat command should be able to see port 10000 is started
And you can see in logs / spark-root-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-ubuntu2.out
INFO ThriftCLIService: ThriftBinaryCLIService listening on 0.0.0.0/0.0.0.0:10000 proves that the startup was successful
How to operate thriftserver can refer to my previous blog:
http://www.cnblogs.com/chenfool/p/4502212.html
Reference article:
http://www.cloudera.com/content/cloudera/en/documentation/cdh4/v4-2-0/CDH4-Installation-Guide/cdh4ig_topic_18_4.html
http://docs.hortonworks.com/HDPDocuments/Ambari-1.6.1.0/bk_ambari_reference/content/nndb-using-hive-postresql.html
http://www.sequoiadb.com/cn/index.php?p=whitepaper_7
spark sql metastore docking postgresql