Tag: Blank target border style title
Org.apache.hadoop.hive.ql.stats contains classes and interfaces as shown in:
650) this.width=650; "Style=" background-image:none; border-bottom:0px; border-left:0px; padding-left:0px; padding-right:0px; border-top:0px; border-right:0px; padding-top:0px "title=" stats "border=" 0 "alt=" stats "src=" http://s3.51cto.com/wyfs02/M02/6D/33/ Wkiol1vemovjaqzhaadb0x3a-ly462.jpg "" 877 "height=" 1061 "/>
which
Interface:
Clientstatspublisher:
Contains the Run method, and no other class in hive implements the method, which is used primarily for hive stats extensions. The specific implementation needs to be
Hive.client.stats.publishers This parameter determines that this parameter is a comma-separated stats publisher, which stats Publisher is called by each job.
This parameter is empty by default. Where client stats Publisher is the name of the Java class that implements Clientstatspublisher the interface.
Statspublisher:
Publish the interface of stats, declare the method such as Init,connect,publishstat,closeconnection, the class that needs to publish stats need implement this interface.
Init method:
This method needs to be initialized one time, possibly creating the database and the table (if it does not exist). To achieve the purpose of initializing only once, this method needs to be called on the hive client side, not the mapper/reducer call.
Parameters: hconf hiveconf: Contains configuration parameter information to connect to the intermediate stats database.
Return value: Returns True if initialization succeeds, otherwise false
Connect method:
Connect to the intermediate stats database.
Parameters: hconf hiveconf: Contains configuration parameter information to connect to the intermediate stats database.
Return value: Returns True if the connection is successful, otherwise false
Publishstat Method:
This method publishes a given statistic to a disk storage, which may be hbase or MySQL
Parameters: FileID: A string identifier, the statistics are published by all Mapper/reducer and then collected, this ID is unique for each task output partition.
Example: Output directory name (per filesinkoperator unique) + partition specification (dynamic partitioning only) +taskid (the last part of the task file)
Stats: A key-value pair collection, where key is the name of the publication statistic, and value is the values of the given statistic information.
Return value: False If True is returned if successful
CloseConnection Method:
Close the temporary storage connection.
Statsaggregator:
Collects the stats interface.
Connect method:
Connect to the intermediate stats database.
Parameters: hconf hiveconf: Contains configuration parameter information to connect to the intermediate stats database.
Sourcetask
Return value: Returns True if the connection is successful, otherwise false
Aggregatestats Method:
This method aggregates the given statistics from the task being used, and when the aggregation is complete, the method automatically clears the records used.
The Keyprefix:key prefix is used for Statspublisher publishing stats, for example, if Statspublisher publishes stats with a composite key value:
Output Directory name (per filesinkoperator unique) + partition specification (for dynamic partitioning only) +taskid (the last part of the task file)
Then Keyprefix will be the first two components of the beginning of the composition.
StatType: A published string, not a key value, for example: NumRows
Return value: A Long value, converted to a string type, or null if there are any exceptions or errors
CloseConnection Method:
Close the temporary storage connection.
Cleanup method:
This method is called after stats is collected; Once we support multiple statistics, we will not perform automatic cleanup after collection.
After this method is called, CloseConnection must be called.
This method can also be used to clear statistics, although the information is not collected. Typically occurs when a job fails or forces a stop to publish some statistics.
The Keyprefix:key prefix is used for Statspublisher publishing stats.
Return value: Cleanup successfully returns true, otherwise false
Statscollectiontaskindependent:
This is a maker interface, used to differentiate stats publisher/aggregator, and whether each task tracks stats.
Hive Source Reading 02-org.apache.hadoop.hive.ql.stats Overview