1. About the optimization of hive
-"Large table split small table"
-"Filter Field"
-"Store by field category"
-"External table and partition table"
-"External table: Delete Only the metadata information, do not delete the data file
Multiple external tables are used by multiple people to operate on the same data file
-"Partition table: Databases, tables, partitions in hive are all folders
Improved retrieval efficiency
-"Manually Created
-"Dynamic Partitioning"
-"External table + partition Table"
-"Storage of data"
-"Storage Format: Column Storage"
-"Compression
Optimization of 2.SQL
-"After join first filter
Optimization of 3.mapreduce
-"Parallel Processing"
Job1&job2job3
Hive.exec.parallel=true
Hive.exec.parallel.thread.number=8
-"JVM Reuse"
mapreduce.job.jvm.numtasks= $number
Because every time the JVM is turned on and off, it requires a lot of resources.
-"Speculative execution
Mapreduce.map.speculative=true
Mapreduce.reduce.speculative=true
Hive.mapred.reduce.tasks.speculative.execution=true
-"Number of maps and reduce"
-"Map number: Bad man-made setting"
-"Size of HDFs block: dfs.blocks.size=128m
Size of Shard: minisize/maxsize
Mapreduce.input.fileinputformat.split.minisize
-"Enterprise Scenario"
-"file large, less 200M 100 map by default block processing
-"File small, multi 40M 400 map by Shard
-Number of REUDCE
Number of 0.95-1.75*node* containers
-Native mode Local: Run the entire task on the current node
<property>
<name>hive.exec.mode.local.auto</name>
<value>true</value>
<description> let Hive determine whether-to-run in local mode automatically </description>
</property>
Conditions:
1, the job input data size cannot exceed the default parameters
inputbytes.size=128m
2. Number of job-processing map tasks
About Hive tuning (itself, sql,mapreduce)