1. Hive IntroductionHive is an open-source hadoop-based data warehouse tool used to store and process massive structured data. It stores massive data in the hadoop file system instead of the database, but provides a data storage and processing mechanism for database-like databases, and uses HQL (SQL-like) the language automatically manages and processes the data. We can regard the volume of structured data in hive
Hive is a framework that occupies and plays an important role in the ecosystem architecture of Hadoop, and it is used in many practical businesses, so that the popularity of Hadoop is largely due to the presence of hive. So what exactly is hive and why it occupies such an important position in the Hadoop family, this article will focus on Hive's architecture (arc
Label: First, an overview of the task map: The process is to first delete the files on HDFs with Thdfsdelete, then import the data from the organization tables in Oracle into HDFS, establish hive connection-"Hive Build Table-" Tjava Get system Time-" Thiveload Import the files on HDFs into the hive table. The settings for each of these components are described b
Tags: text display ext tor obs execute text dir 0.11This article introduces the basics of database (Database/schema) and tables (table) in hive, which is just a few common, basic, reasons for space.Database and table for hiveLook at a sketch first:Hive structureAs can be seen from the figure, hive as a "database", structurally active to the traditional database,
Compared with many tutorials, Hive has introduced concepts first. I like to install them first, and then use examples to introduce concepts. Install Hive first. Check whether the corresponding yum source has been installed. If the yum source blog. csdn. netnsrainbowarticledetails42429339hive is not installed according to the yum source file written in this tutorial
Compared with many tutorials,
I. Installation of Hive
The components are arranged as follows:
172.16.57.75 bd-ops-test-75 mysql-server
172.16.57.77 bd-ops-test-77 Hiveserver2
1. Install Hive
Install the Hive on 77:
# Yum Install hive Hive-metastore
Hive Installation Deployment(Installation will have version issue hadoop1.0 version above please install hive-0.90 testhadoop2.0 above Please install hive-0.12.0 or the latest version of the test)Hive-0.9.0:http://pan.baidu.com/s/1rj6f8hive-0.12.0:http://mirrors.hust.edu.cn/apache/
[A "," B "," C "]
3
[D "," E "," F "]
4
[D "," E "," F "]
Add a lateral view:
Select mycol1, mycol2 from basetable lateral view explode (col1) mytable1 as mycol1 lateral view explode (col2) mytable2 as mycol2;
The execution result is:
Int mycol1
String mycol2
1
""
1
"B"
1
"C"
2
""
2
"B"
2
"C"
3
"D"
Union syntax
Select_statement Union all select_statement...
Union is used to combine the result sets of multiple select statements into an independent result set. Currently, only union all (BAG Union) is supported ). Duplicate rows are not eliminated. The number and name of columns returned by each select statement must be the same. Otherwise, a syntax error is thrown.
If some extra processing is required for the Union result, the entire statement can be embedded in the from clause, as
Label: Start Thriftserver in spark after modification and then connect in beeline mode under Spark's Bin or write a. sh file every time you execute it directly . sh file contents such as:./beeline-u jdbc:hive2://yangsy132:10000/default-n root-p YangsiyiSpark on Hive configures Hive's metastore to MySQL
suitable users;
Hive is already reducing the performance gap with other engines. Most of the replacements for Hive were introduced in 2012, and analysts waited for hive queries to be completed until they committed suicide. However, when Impala, Spark, Drill and other big strides, hive just followed, slowly improve
Hive can convert class SQL query statements to the map reduce task of Hadoop, so that people familiar with relational databases can also take advantage of the powerful parallel computing power of Hadoop. Hive provides powerful built-in function support, but there are always special cases where built-in functions cannot be overwritten, which requires us to define our own functions. Next, let's take a look at
2.3 Hive Internal Introduction: P44The jar file under the $HIVE _home/lib is a specific functional part; (CLI module) Other components, Thrift services, remote access to other process features, and the ability to access hive using JDBC and ODBC; all hive clients need a Metastore Service (meta-data Service), which is us
Label: Environment:hadoop2.2.0 hive0.13.1 Ubuntu 14.04 LTS java version "1.7.0_60"oracle10g * * * Welcome reprint. Please indicate source * * * http://blog.csdn.net/u010967382/article/details/38709751Download the installation package at the following addressHttp://mirrors.cnnic.cn/apache/hive/stable/apache-hive-0.13.1-bin.tar.gz The installation package is extracted to the server/home/fulong/
[Spark] [Hive] [Python] [SQL] A small example of Spark reading a hive table$ cat Customers.txt1Alius2Bsbca3Carlsmx$ hiveHive>> CREATE TABLE IF not EXISTS customers (> cust_id String,> Name string,> Country String>)> ROW FORMAT delimited fields TERMINATED by ' \ t ';hive> Load Data local inpath '/home/training/customers.txt ' into table customers;
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.