Apache Hive does not collect stats issues

Source: Internet
Author: User

Environment:

hive:apache-hive-1.1.0

hadoop:hadoop-2.5.0-cdh5.3.2

Hive metadata and stats are stored using MySQL.

The relevant parameters of hive stats are as follows:

Hive.stats.autogather: Automatically collects statistics when the Insert Overwrite command is turned on by default, set to True

Hive.stats.dbclass: Database storing hive temporary statistics, default is Jdbc:derby; set to Jdbc:mysql

Hive.stats.jdbcdriver: The database temporarily stores the JDBC driver for hive statistics; set to Com.mysql.jdbc.driver

Hive.stats.dbconnectionstring: Temporary Statistics database connection string, default jdbc:derby:databasename=tempstatsstore;create=true; Set to Jdbc:mysql://[ip:port]/[dbname]?user=[username]&password=[password]

Hive.stats.defaults.publisher: If Dbclass is not JDBC or hbase, use this as the default publication, you must implement the Statspublisher interface, default is empty; leave the default

Hive.stats.defaults.aggregator: If Dbclass is not JDBC or hbase, then using this class to do the aggregation requires implementing the Statsiaggregator interface, which is empty by default;

< Span style= "Color:rgb (51,51,51); Font-family:arial;font-size:14px;line-height:26px;background-color:rgb ( 255,255,255); " > hi VE.STATS.JDBC.TIMEOUT:JDBC Connection Timeout configuration, default 30 seconds; leave the default

< Span style= "Color:rgb (51,51,51); Font-family:arial;font-size:14px;line-height:26px;background-color:rgb ( 255,255,255); " > < Span style= "Color:rgb (51,51,51); Font-family:arial;font-size:14px;line-height:26px;background-color:rgb ( 255,255,255); " >hive.stats.retries.max: The maximum number of retries when statistics are released when the database is updated, the default is 0, no retry; leave the default

< Span style= "Color:rgb (51,51,51); Font-family:arial;font-size:14px;line-height:26px;background-color:rgb ( 255,255,255); " > < Span style= "Color:rgb (51,51,51); Font-family:arial;font-size:14px;line-height:26px;background-color:rgb ( 255,255,255); " > hi Ve.stats.retries.wait: Heavy the wait window between retries is 3000 milliseconds by default;Leave the default

< Span style= "Color:rgb (51,51,51); Font-family:arial;font-size:14px;line-height:26px;background-color:rgb ( 255,255,255); " > < Span style= "Color:rgb (51,51,51); Font-family:arial;font-size:14px;line-height:26px;background-color:rgb ( 255,255,255); " >hive.client.stats.publishers: A list of statistics publication classes for the job of count, separated by commas, empty by default, must implement Org.apache.hadoop.hive.ql.stats.ClientStatsPublisher interface; leave the default

Phenomenon:

Execute Insert Overwrite table does not return numrows and rawdatasize correctly; The results are similar to the following

[numfiles=1, numrows=0, totalsize=59, rawdatasize=0]

< Span style= "Color:rgb (51,51,51); Font-family:arial;font-size:14px;line-height:26px;background-color:rgb ( 255,255,255); " > < Span style= "Color:rgb (51,51,51); Font-family:arial;font-size:14px;line-height:26px;background-color:rgb ( 255,255,255); " > In the hive stats MySQL database there is no relevant stats inserted in.

The first location problem is the hive stats problem, because the console print too little information, can not pinpoint the problem;

Hive--hiveconf hive.root.logger=info,console; print verbose logs and find the following information:

[Error 30001]: Statspublisher cannot be initialized. There is a error in the Initializationof Statspublisher, and retrying might help. If you dont want the "query to fail because Accuratestatistics could" is collected, set Hive.stats.reliable=false

Specified key was too long; Max key length is 767 bytes

This problem is relatively simple, because the Hive1.1.0,id column length defaults to 4000, and the set ID is the primary key, resulting in an error

Org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsSetupConstants

MySQL-65535, SQL Server-8000, Oracle-4000, Derby-32762, Postgres-large. public static final int id_column_varchar_size = 4000;

Org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher:public boolean init (Configuration hconf)

if (Colsize < jdbcstatssetupconstants.id_column_varchar_size) {String altertable = Jdbcstatsutils.getal                  Teridcolumn ();              Stmt.executeupdate (altertable); }

From this code know, if the table ID column size is less than 4000, it will be automatically changed to 4000, so only modify the source will be 4000->255 (MySQL uses UTF8 encoding, a UTF8 occupies 3 bytes, so 255*3=765<767) , and 255 bytes are sufficient for the current cluster.

public static final int id_column_varchar_size = 255;

Recompile, package push to test environment, test to find the problem or exist.

[ numfiles =1,  numrows =0, totalsize=59, rawdatasize=0]

Hive--hiveconf hive.root.logger=info,console; print verbose logs

No abnormalities have been found to occur.

to track the problem, set hive.stats.reliable=true;

Re-execute the command, this error, check the job error message, found that the problem occurred in

Org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsAggregator

try {class.forname (driver). newinstance ();      } catch (Exception e) {log.error ("error during instantiating JDBC driver" + Driver + ".", E);    return false; }


This is run on yarn, unable to find com.mysql.jdbc.Driver This class caused, will MySQL driver package, placed under yarn/lib/directory, full cluster push, rerun test script, find problem solved.


< Span style= "Color:rgb (51,51,51); Font-family:arial;font-size:14px;line-height:26px;background-color:rgb ( 255,255,255); " > < Span style= "Color:rgb (51,51,51); Font-family:arial;font-size:14px;line-height:26px;background-color:rgb ( 255,255,255); " >


This article is from the "Supermagi" blog, make sure to keep this source http://supermagi.blog.51cto.com/10191319/1649905

Apache Hive does not collect stats issues

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.