It's almost a year since I got in touch with hive. I have used some of my work experience as follows:
1) keep in mind that hive is just a hadoop-based data warehouse tool that converts SQL into mapreduce. Its strength lies in data statistics, convenient and flexible development and testing, and complex
We recommend that you use a temporary table to process ETL logic in stages or write mapreduce programs for processing.
2) Check whether hive SQL causes data skew. Solutions to data skew. Measure the test taker's knowledge about your data distribution, for example, whether some keys are multiple times of other keys, or the associated keys are empty.
3) a stable scheduling system is very important. Because hive And Tez may cause unexpected errors during operation, the scheduling system is very good at automatically re-running the online steps for 2 or 3 times.
4) in Perl and python, try to run hql in a step as much as possible. It is much easier to catch up with unexpected errors.
5) Try to understand how hql is converted into mapreduce, which helps performance tuning and troubleshooting.
Hive usage Summary