The meaning of the hive configuration item is detailed

Source: Internet
Author: User

Hive.exec.script.maxerrsize: A map/reduce task allows printing to the maximum number of bytes in the standard error, in order to prevent the script to fill the partition log, the default is 100000;

Hive.exec.script.allow.partial.consumption:hive whether to allow the script to exit successfully without reading anything from standard input, turn off false by default;

Hive.script.operator.id.env.var: When a user uses the transform function to make a custom map/reduce, the name of the environment variable that stores the unique script identifier, the default hive_script_operator_id;

Hive.exec.compress.output: Control hive Query Results output is compressed, compressed in the mapred.output.compress of Hadoop configuration, default does not compress false;

Hive.exec.compress.intermediate: Control hive Query Intermediate results are compressed, the same configuration, the default is not compressed false;

Hive.exec.parallel:hive execution of the job is executed in parallel, default does not turn on false, in many operations such as join, sub-query and no association can run independently, in this case, the opening of parallel operations can be greatly accelerated;

Hvie.exec.parallel.thread.number: When parallel operation is turned on, how many jobs are allowed at the same time, the default is 8;

Hive.exec.rowoffset: If a virtual column with row offsets is provided, the default is false not provided, Hive has two virtual columns: One is input__file__name, the path to the input file, and the other is block__offset__ Inside__file, which represents the block offset recorded in a file, is useful for troubleshooting queries that do not meet expected or null results (from this article);

Hive.task.progress: Controls whether hive updates the task Progress counter periodically during execution, enabling this configuration to help job tracker better monitor the execution of tasks, but with some performance loss, When the dynamic partition flag hive.exec.dynamic.partition is turned on, the configuration is automatically turned on;

Hive.exec.pre.hooks: Performs preconditions, a comma-separated Java class list that implements the Org.apache.hadoop.hive.ql.hooks.ExecuteWithHookContext interface, After configuring this configuration, each hive task executes the pre-execution hook before execution, which is empty by default;

Hive.exec.post.hooks: Ibid, after the execution of the hook, the default is empty;

Hive.exec.failure.hooks: Ibid, exception when the hook, the program occurs when the exception, the default is empty;

Hive.mergejob.maponly: Attempt to generate a map-only task to do the merge, if support Combinehiveinputformat, the default is turned on true;

Hive.mapjoin.smalltable.filesize: Enter the mapjoin threshold of the table file, if the input file size is less than this value, then try to convert the normal join to Mapjoin, default 25MB;

Hive.mapjoin.localtask.max.memory.usage:mapjoin local task execution when the hash table to accommodate the maximum amount of key/value, more than this value, the local task will automatically exit, the default is 0.9;

Hive.mapjoin.followby.gby.localtask.max.memory.usage: Like above, except if there is a group by after Mapjoin, the configuration controls the local memory capacity of the query like this, Default is 0.55;

Hive.mapjoin.check.memory.rows: Performs a memory usage check after the number of rows is calculated, default 100000;

Hive.heartbeat.interval: The time interval for sending heartbeats, used in Mapjoin and filter operations, default 1000;

Hive.auto.convert.join: Depending on the size of the input file, determines whether the normal join is converted to a mapjoin optimization, the default does not turn on false;

Hive.script.auto.progress:hive the Transform/map/reduce script executes automatically sends progress information to Tasktracker to avoid the task not responding to the manslaughter, when the script is output to a standard error, The progress information is sent, but when the item is turned on, the output to standard error will not cause the message to be sent, so it may cause the script to have a dead loop, but Tasktracker has not been checked to keep the loop going;

Hive.script.serde: User script transforms input to output Serde constraint, default is org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe;

Hive.script.recordreader: The default reader when reading data from a script, the default is Org.apache.hadoop.hive.ql.exec.TextRecordReader;

Hive.script.recordwriter: Default writer When writing data to script, default Org.apache.hadoop.hive.ql.exec.TextRecordWriter;

Hive.input.format: Input format, default is Org.apache.hadoop.hive.ql.io.CombineHiveInputFormat, if there is a problem, you can use Org.apache.hadoop.hive.ql.io.Hiv Einputformat;

Hive.udtf.auto.progress:UDTF whether Hive sends progress information to Tasktracker when executing, false by default;

The Hive.mapred.reduce.tasks.speculative.execution:reduce task guesses if execution is on, and the default is true;

Hive.exec.counters.pull.interval: The runtime job polling jobtracker time interval, setting small will affect the jobtracker load, set large may not see the running task information, to go to balance, the default is 1000;

Hive.enforce.bucketing: Whether the data bucket is enforced, by default false, if enabled, when writing to the table data will start the bucket, personal understanding of the bucket can refer to this article, write more clearly, there are examples, The sub-buckets do not feel much influence when doing full table query and query with partition field, the main function is sampling;

Hive.enforce.sorting: When the force sort is turned on, the interpolation data into the table will be forced to sort, default false;

Hive.optimize.reducededuplication: If the data has been aggregated according to the same key, then remove the redundant map/reduce job, this configuration is the recommended configuration of the document, it is recommended to open, the default is true;

Hive.exec.dynamic.partition: Whether dynamic partitioning is supported in DML/DDL, default false;

Hive.exec.dynamic.partition.mode: The default strict, in strict mode, the use of dynamic partitions must be confirmed in a static partition, the other partitions can be dynamic;

Hive.exec.max.dynamic.partitions: Upper limit of dynamic partition, default 1000;

Hive.exec.max.dynamic.partitions.pernode: The maximum number of dynamic partitions that can be created per mapper/reducer node, default 100;

Hive.exec.max.created.files: The maximum number of HDFs files that a mapreduce job can create, default is 100000;

Hive.exec.default.partition.name: When dynamic partitioning is enabled, if the data column contains null or an empty string, the data is inserted into the partition, and the default name is __hive_default_partition__;

Hive.fetch.output.serde:FetchTask the Serde required to serialize fetch output, the default is org.apache.hadoop.hive.serde2.DelimitedJSONSerDe;

Hive.exec.mode.local.auto: Whether the hive decides to run automatically in local mode, the default is false, on what conditions to open localmode, you can refer to this article;

The meaning of the hive configuration item is detailed

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.