Configure apacheHIVE metadata DB as PostgreSQL

Source: Internet
Author: User
Source: amutu. comblog201306hive-metastore-db-postgresqlHIVE metadata uses derby as the storage DB by default, derby as the lightweight DB, it is more convenient to use in the development and testing process, but in the actual production environment, ease of use, Disaster Tolerance, stability, and various monitoring and Operation

Source of this article: http://amutu.com/blog/2013/06/hive-metastore-db-postgresql/ HIVE metadata uses derby as the storage DB by default, derby as a lightweight DB, it is convenient to use in the development and testing process, but in the actual production environment, ease of use, Disaster Tolerance, stability, and various monitoring and Operation

Source: http://amutu.com/blog/2013/06/hive-metastore-db-postgresql/

By default, the HIVE metadata uses derby as the storage DB, and derby as the lightweight DB, which is easy to use during development and testing. However, in the actual production environment, we also need to consider ease of use, Disaster Tolerance, stability, and various monitoring and O & M tools, which are all lacking in derby. MySQL and PostgreSQL are two commonly used open-source database systems. They are often used to replace derby in the production environment. There are many articles on configuring MySQL on the Internet. I will not go into details here. This article mainly describes how to configure HIVE metadata DB as PostgreSQL.

HIVE version: HIVE 0.7-snapshot, HIVE 0.8-snapshot

Step 1: add the user's DB for metadata in PG

First, create an account and DB for the HIVE metadata in PostgreSQL.

-- Log on to PG as an administrator:

Psql postgres-U postgres

-- Create user hive_user:

Create user hive_user;

-- Create DB metastore_db, and the owner is hive_user:

Create database metastore_db with owner = hive_user;

-- Set the password of hive_user:

/Password hive_user

After completing the preceding steps, make sure that the configuration in PostgreSQL pg_mirror.conf allows the ip address of the machine where HIVE is located to access PG.

Step 2: Download the PostgreSQL JDBC driver

Create the auxlib directory in the HIVE_HOME directory:

Mkdir auxlib

In this case, the HIVE_HOME Directory should contain bin, lib, auxlib, conf, and other directories.

Download the PostgreSQL JDBC driver

WgetHttp://jdbc.postgresql.org/download/postgresql-9.0-801.jdbc4.jar

Place the downloaded postgresql-9.0-801.jdbc4.jar in auxlib.

Step 3: Modify the HIVE configuration file

Create a new hive-site.xml file in HIVE_HOME with the following content, and the blue font is modified according to the relevant information of PG server.

Javax. jdo. option. ConnectionURL

Jdbc: postgresql: // pg_server_ip: pg_server_port/metastore_db?

JDBC connect string for a JDBC metastore

Javax. jdo. option. ConnectionDriverName

Org. postgresql. Driver

Driver class name for a JDBC metastore

Javax. jdo. option. ConnectionUserName

Hive_user

Username to use against metastore database

Javax. jdo. option. ConnectionPassword

Hive_user_pass

Password to use against metastore database

Step 4: Initialize the metadata table

Metadata metastore does not have a table by default. When HIVE uses a table for the first time, it is automatically created if it finds that the table does not exist. This process is fine for derby and mysql. Therefore, derby and mysql do not need this step as metabases.

PostgreSQL encounters some problems during initialization, resulting in a deadlock in the PostgreSQL database. For example, execute the following HIVE statement:

> Create table kv (key, int, value string) partitioned by (ds string );

OK

> Alter table kv add partition (ds = '20160901 ');

HIVE stops at this point when you execute this sentence.

Check the PostgreSQL database and find that there are two connections in the transaction operation, one of which is:

In transaction

At this time, the transaction is idle, and the other one is:

Alter table "PARTITIONS" add constraint "PARTITIONS_FK1" foreign key ("SD_ID") REFERENCES "SDS" ("SD_ID") INITIALLY DEFERRED

In the waiting status.

Further Check the log and find that the general process is as follows:

HIVE initiates the Alter table kv add partition (ds = '000000') statement. At this time, the DataNucleus interface initiates the first transaction whose isolation is SERIALIZABLE and locks metadata tables such as Tsung. During the transaction process, DataNucleu will automatically create if it finds that tables such as PARTITIONS do not exist. As a result, another isolation is a SERIALIZABLE transaction, and the first transaction becomes In transaction. After the second transaction creates a PARTITIONS table, it also needs to add constraints to it. In this case, it needs to obtain the exclusive lock of the referenced table SDS, however, this lock has been obtained by the first transaction, so you need to wait until the first transaction ends. The first transaction is waiting for the end of the second transaction. This causes a deadlock.

Similar situations occur in:

> Create test (key int );

OK

> Drop table test;

When you drop a table, it will drop its index. At this time, there is no index metadata table, it goes to the key, and then produces a deadlock.

There are three ways to solve this deadlock problem:

Method 1:

Use pg_terminate_backend () of PG to end the first transaction. This ensures that the second transaction is complete and the metadata table key is successful.

Method 2:

Enable HIVE to separate the process of creating a metadata table from the process of adding data to the metadata table:

> Create table kv (key, int, value string) partitioned by (ds string );

OK

> Show partitions kv;

OK

> Alter table kv add partition (ds = '20160901 ');

OK

No deadlock will occur when the above statement is executed, because when the show partitions kv statement is executed, it is a read-only statement and will not be locked. When this statement finds that tables such as PARTITIONS are not in use, no deadlock will occur when these tables are created.

For index tables

> Show index on kv;

You can create an IDXS table.

Method 3:

Use the SchemaTool provided by DataNucleu to input the HIVE metastore/src/model/package. jdo file. This tool can automatically create tables in metadata. For details, see:

Http://www.datanucleus.org/products/accessplatform_2_0/rdbms/schematool.html

Summary

This article describes how to use PostgreSQL as the configuration method of HIVE metadata DB and how to solve the deadlock problem, hoping to help friends who use HIVE and PostgreSQL.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.