What is hive?
1) hive is a hadoop (HDFS/MR) data warehouse for managing and querying result-based/unstructured data;
2) A mechanism for storing, querying, and analyzing large-scale data stored in hadoop;
3) hive defines a simple SQL-like query language called hql, which allows users familiar with SQL to query data;
4) Allow Java to develop custom function udfs to handle complicated analysis tasks that cannot be completed internally;
5) hive does not have a special data format (separators can be set flexibly );
ETL process (extraction-transformate-loading): extracts data from a relational database to HDFS, and uses hive as a data warehouse. After hive computing and analysis, the process of importing the results to a relational database.
Official study Wizard: https://cwiki.apache.org/confluence/display/Hive/Tutorial
Hive is a data warehouse built on hadoop.
1) Use hql as the query interface;
2) use HDFS for storage;
3) Use mapreduce for computing;
Hive application scenarios
Data source:
1) file data, such as a device of China Mobile that generates a large number of files in fixed format every day;
2) Database
The two different data sources have one thing in common: to use hive, you must put the data in hive. Generally, the following two methods are used:
1) file data: load to hive
2) Database: sqoop to hive
Offline data processing;
Hive has a high execution latency because hive is often used for data analysis and does not require high real-time performance;
Hive has the advantage of processing big data and has no advantage in processing small data because hive has a high execution latency.
How can the foreground System Access hive data when processing data is stored in hive tables?
First, transfer the hive processing result data to a relational database. sqoop is used to import and export data.