First, what is hive
Hive is a data warehouse infrastructure built on Hadoop. It provides a range of tools that can be used for data extraction conversion loading (ETL), a mechanism that can store, query, and analyze large-scale data stored in Hadoop. Hive defines a simple class SQL query language called HQL, which allows users who are familiar with SQL to query data. At the same time, the language also allows developers to familiarize themselves with the development of custom Mapper and reducer for the built-in mapper and reducer of complex analytical work that cannot be done.
Ii. The architecture of Hive
A hive architecture provided for the official website.
From an architectural perspective, hive is the Data Warehouse infrastructure built on Hadoop.
1, the user interface of hive is: Cli,hiveserver,webui.
①CLI is a command-line client or a command-line environment, the client can operate directly in command-line mode.
②hiveserver supports JDBC/ODBC mode, Hive provides thrift service, thrift client currently supports C++/java/php/python/ruby.
The ③webgui interface allows hive to provide a more intuitive web operations page. However, when processing large amounts of data, it is not recommended.
2. Metastore metadata Store, storing the structured information of all hive tables and partitions, including column and column type information, serializer and deserializer, to read and write data in HDFs.
There are three ways to store it.
① built-in Derby style
②local Way
③remote Way
About three kinds of storage methods will be introduced in detail in later blog post.
3. The relationship between Hadoop and hive
Hive is a component of Hadoop, and as a data Factory repository, hive's data is stored in a Hadoop file system, and hive provides SQL statements for Hadoop, and Hadoop can manipulate data in the file system through SQL statements. Hive is dependent on Hadoop.
Download a picture on the internet, it is very clear about their relationship, such as:
Third, the installation of hive
1, open the Services Wizard, select Install Hive, and install the MapReduce before you install hive. Such as
2, first we will see that we will select a set of dependencies for hive.
3, custom assign roles, according to the actual situation, to assign roles.
4, select the database, you can choose the embedded database, later to change.
Test the connection and, if successful, click Continue.
5, installation progress.
Introduction and installation of Hive learning Note _hive