In-depth hive Enterprise Architecture optimization, hive SQL optimization, compression, and distributed caching (Enterprise Hadoop application core products)
Course Lecturer: Cloudy
Course Category: Hadoop
Suitable for people: Beginner
Number of lessons: 10 hours
Using the technology: Hive
Projects involved: Hive Enterprise-Level optimization
Consulting qq:1840215592
650) this.width=650; "src=" Http://s3.51cto.com/wyfs02/M02/5B/3B/wKiom1UCW5HBkPU7AAFzcg1duEI815.jpg "title=" In layman's, hive enterprise-class architecture. png "alt=" wkiom1ucw5hbkpu7aafzcg1duei815.jpg "/>
First, the curriculum environment:
Cloudera Hadoop 4 (Hadoop 2.0)
Hive-0.90
Second, the required technical basis:
Hadoop Foundation, Hive Foundation, Linux Foundation, and other unlimited (all in Java and. NET direction, are suitable).
Iii. introduction of the course:
1. Introduction to the course content
Of course, a good architecture trumps any optimizations, and what strategies are there to build the hive job architecture?
Good hql can also increase efficiency, how to write efficient hql?
Modifying hive parameters can sometimes be a good effect.
2. Course Outline
Chapter One: Optimization Strategies for Architecture (5 speaking)
The main performance bottleneck for Hadoop is the IO load, which is an optimized play for the reduced IO load.
This chapter outlines:
Research on optimization of Operation architecture
Strategies and scenarios for multiple drop IO loads
Summary of table and source table
Rational design of table partitioning and dynamic partitioning
Compressed, distributed cache
Chapter Two: Hive SQL syntax level and properties parameter level optimization (4 speak)
Grammatical optimization means induction
Decision and control of map number and reduce number and case analysis
Avoidance and resolution of data skew
Analysis of the execution plan, from the execution plan to find the tilt of the fundamental
Properties parameters
Efficient join, Mapjoin, SEMI join
Reduce job Merge Mr
MapReduce Intermediate Parameters
Chapter III: Impala Familiarity and use (1 speak)
Impala is a product of the Cloudera company that has a copy of Hive, and has a stable release.
The theoretical performance is better than hive, but the current version of functionality and extensibility is far from replacing Hive.
The product may have some influence in the future.
Features: Like hive is a class SQL product
Metabase for public Hive
First Lecture: Hive Architecture and hive job form
Second: The Hive optimization strategy of the big bottom and architecture optimization case One
Third Lecture: Architecture optimization case two drop IO load policy i
Chapter Four: Architecture optimization case two drop IO load Strategy II
V: Architecture optimization case two drop IO load policy iii-compression and distributed caching
Six: Hive syntax, parameter level optimization one
Seventh: Hive syntax, parameter level optimization two
Eighth: Hive syntax, parameter level optimization three
In-depth hive Enterprise Architecture Optimization Video Tutorial