Hive of Hadoop Preliminary

Last Update:2018-07-19 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Hive Introduction

Hive is a data warehouse infrastructure built on Hadoop. It provides a range of tools that can be used for data extraction and transformation loading (ETL), a mechanism for storing, querying, and analyzing large data stored in Hadoop. Hive defines a simple class SQL query language, called HQL, that allows users who are familiar with SQL to query data. At the same time, this language also allows familiarity with the development of custom Mapper and reducer of MapReduce developers to deal with the complex analytical work that the built-in mapper and reducer cannot complete.

Hive does not have a special data format. Hive can work well above Thrift, control separators, and also allow users to specify data formats.

Main Features:

The method of storage is to map a structured data file to a database table. Provides a class SQL language to implement the full SQL query functionality. SQL statements can be converted to MapReduce tasks, which is very suitable for statistical analysis of Data Warehouse. deficiencies:

Store and read data in the form of row storage (Sequencefile). Inefficient: The efficiency is low when you need to take out all the data and then extract the data from a column when you want to read a column of data in a datasheet. It also takes up more disk space. Current Optimization:

As a result of the above deficiencies, someone (Dr. Charlie) introduces a storage structure that transforms the record-storage structure in a distributed data processing system into a unit, thus reducing the number of disk accesses and improving the performance of query processing. In this way, because the same property values have the same data type and similar data attributes, compressed storage with attribute values is more compressed and saves more storage space. Hive Installation Configuration

Installation Requirements

Java 1.6 Hadoop Downloads Hive releases from the official website and unzip it in the corresponding directory.

$ TAR-XZVF hive-x.y.z.tar.gz

To set system environment variables: (Unix:/etc/profile files) [Java] View plain copy export hive_home=.../pig-x.y.z export path= $PATH: $HIVE _home/bi N Hive Operation one or two ways: 1, non-interactive: Build Table:
Enter data:
Inquire:
2, Interactive: Build table:

Enter data:

Query 1:

Query 2:

Second, grammar analysis: (English part from Wikipedia Hive languagemanual DML) 1. Loading file import table (Loading files into tables) Hive does not does any transformation while loading data into tables. The Load operations are currently pure copy/move operations that move datafiles into locations to corresponding tables.

When data is loaded into a table, no conversion is made to the data. The Load operation simply copies/moves the data to the location of the Hive table.

Hive load data does not do any conversion the data loaded into the table simply moves into the corresponding hive table to move the data file. Pure load operation copy/move operation.

Grammar:

[SQL] view plain copy LOAD DATA [local] inpath ' filepath ' [OVERWRITE] into TABLE tablename [PARTITION (partcol1=v Al1, Partcol2=val2 ...)]

Analytical:

The Load operation is simply a copy/move operation that moves the data file to the location corresponding to the Hive table. FilePath can be: A relative path, such as a project/data1 absolute path, such as the complete URI of the/user/hive/project/data1 containing pattern, for example: hdfs://namenode:9000/user/hive/ The target of the project/data1 load can be a table or partition. If the table contains partitions, you must specify the partition name for each partition. FilePath can refer to a file (in which case, Hive moves the file to the table's directory) or a directory (in which case, Hive moves all the files in the directory to the table's corresponding directory). If local is specified, then: The load command will look for filepath in the native file system. If the discovery is a relative path, the path is interpreted as the current path to the current user. The user can also specify a full URI for the local file, such as: File:///user/hive/project/data1. The load command copies the files in the filepath to the destination file system. The target file system is determined by the location attribute of the table. The copied data file is moved to the location of the table's data. If the local keyword is not specified, this URI is used directly if FilePath is pointing to a complete uri,hive. Otherwise: If schema or authority,hive is not specified, the URI of Namenode is specified using the schema and Authority,fs.default.name defined in the Hadoop configuration file. If the path is not absolute, the Hive is interpreted relative to the/user/. Hive moves the contents of the file specified in filepath to the path specified by the table (or partition). If the OVERWRITE keyword is used, the content (if any) in the target table (or partition) is deleted, and then the contents of the file/directory to which FilePath points are added to the table/partition. If the target table (partition) already has a file and the file name conflicts with the filename in filepath, the existing file is replaced by the new file.NotesFilePath cannot contain subdirectories. If The keyword is not given, FilePath must refer to files within the same filesystem as the table ' s (or partition ' s) Location. Hive does some minimal checks to make sure that the files being match the target table. Currently it checks that if the table was stored in sequencefile format, the files being loaded, are also sequencefiles, and Vice versa. Please read compressedstorage If your datafile is compressed

Example:

Import data from local to table and append the original table:

[SQL] view plain copy LOAD DATA local inpath '/tmp/pv_2008-06-08_us.txt ' into TABLE C02 PARTITION (date= ' 2008-06- ', country= ' US ')

Import data from local to table and append records:

[SQL] view plain copy LOAD DATA local inpath './examples/files/kv1.txt ' into TABLE pokes; Import data from HDFs to a table and overwrite the original table:

[SQL] view plain copy LOAD DATA inpath '/user/admin/sqlldrdat/cnclickstat/20101101/18/clickstat_gp_fatdt0/0 ' Into

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Hive of Hadoop Preliminary

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Hive of Hadoop Preliminary

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support