Hive Source Reading 02-org.apache.hadoop.hive.ql.stats Overview

Source: Internet
Author: User

Tag: Blank target border style title

Org.apache.hadoop.hive.ql.stats contains classes and interfaces as shown in:

650) this.width=650; "Style=" background-image:none; border-bottom:0px; border-left:0px; padding-left:0px; padding-right:0px; border-top:0px; border-right:0px; padding-top:0px "title=" stats "border=" 0 "alt=" stats "src=" http://s3.51cto.com/wyfs02/M02/6D/33/ Wkiol1vemovjaqzhaadb0x3a-ly462.jpg "" 877 "height=" 1061 "/>

which

Interface:

Clientstatspublisher:

Contains the Run method, and no other class in hive implements the method, which is used primarily for hive stats extensions. The specific implementation needs to be

Hive.client.stats.publishers This parameter determines that this parameter is a comma-separated stats publisher, which stats Publisher is called by each job.

This parameter is empty by default. Where client stats Publisher is the name of the Java class that implements Clientstatspublisher the interface.

Statspublisher:

Publish the interface of stats, declare the method such as Init,connect,publishstat,closeconnection, the class that needs to publish stats need implement this interface.

Init method:

This method needs to be initialized one time, possibly creating the database and the table (if it does not exist). To achieve the purpose of initializing only once, this method needs to be called on the hive client side, not the mapper/reducer call.

Parameters: hconf hiveconf: Contains configuration parameter information to connect to the intermediate stats database.

Return value: Returns True if initialization succeeds, otherwise false

Connect method:

Connect to the intermediate stats database.

Parameters: hconf hiveconf: Contains configuration parameter information to connect to the intermediate stats database.

Return value: Returns True if the connection is successful, otherwise false

Publishstat Method:

This method publishes a given statistic to a disk storage, which may be hbase or MySQL

Parameters: FileID: A string identifier, the statistics are published by all Mapper/reducer and then collected, this ID is unique for each task output partition.

Example: Output directory name (per filesinkoperator unique) + partition specification (dynamic partitioning only) +taskid (the last part of the task file)

Stats: A key-value pair collection, where key is the name of the publication statistic, and value is the values of the given statistic information.

Return value: False If True is returned if successful

CloseConnection Method:

Close the temporary storage connection.

Statsaggregator:

Collects the stats interface.

Connect method:

Connect to the intermediate stats database.

Parameters: hconf hiveconf: Contains configuration parameter information to connect to the intermediate stats database.

Sourcetask

Return value: Returns True if the connection is successful, otherwise false

Aggregatestats Method:

This method aggregates the given statistics from the task being used, and when the aggregation is complete, the method automatically clears the records used.

The Keyprefix:key prefix is used for Statspublisher publishing stats, for example, if Statspublisher publishes stats with a composite key value:

Output Directory name (per filesinkoperator unique) + partition specification (for dynamic partitioning only) +taskid (the last part of the task file)

Then Keyprefix will be the first two components of the beginning of the composition.

StatType: A published string, not a key value, for example: NumRows

Return value: A Long value, converted to a string type, or null if there are any exceptions or errors

CloseConnection Method:

Close the temporary storage connection.

Cleanup method:

This method is called after stats is collected; Once we support multiple statistics, we will not perform automatic cleanup after collection.

After this method is called, CloseConnection must be called.

This method can also be used to clear statistics, although the information is not collected. Typically occurs when a job fails or forces a stop to publish some statistics.

The Keyprefix:key prefix is used for Statspublisher publishing stats.

Return value: Cleanup successfully returns true, otherwise false

Statscollectiontaskindependent:

This is a maker interface, used to differentiate stats publisher/aggregator, and whether each task tracks stats.

Hive Source Reading 02-org.apache.hadoop.hive.ql.stats Overview

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.