Hive of Hadoop Preliminary

Source: Internet
Author: User
Hive Introduction

Hive is a data warehouse infrastructure built on Hadoop. It provides a range of tools that can be used for data extraction and transformation loading (ETL), a mechanism for storing, querying, and analyzing large data stored in Hadoop. Hive defines a simple class SQL query language, called HQL, that allows users who are familiar with SQL to query data. At the same time, this language also allows familiarity with the development of custom Mapper and reducer of MapReduce developers to deal with the complex analytical work that the built-in mapper and reducer cannot complete.

Hive does not have a special data format. Hive can work well above Thrift, control separators, and also allow users to specify data formats.

Main Features:

The method of storage is to map a structured data file to a database table. Provides a class SQL language to implement the full SQL query functionality. SQL statements can be converted to MapReduce tasks, which is very suitable for statistical analysis of Data Warehouse. deficiencies:

Store and read data in the form of row storage (Sequencefile). Inefficient: The efficiency is low when you need to take out all the data and then extract the data from a column when you want to read a column of data in a datasheet. It also takes up more disk space. Current Optimization:

As a result of the above deficiencies, someone (Dr. Charlie) introduces a storage structure that transforms the record-storage structure in a distributed data processing system into a unit, thus reducing the number of disk accesses and improving the performance of query processing. In this way, because the same property values have the same data type and similar data attributes, compressed storage with attribute values is more compressed and saves more storage space. Hive Installation Configuration

Installation Requirements

Java 1.6 Hadoop Downloads Hive releases from the official website and unzip it in the corresponding directory.

$ TAR-XZVF hive-x.y.z.tar.gz

To set system environment variables: (Unix:/etc/profile files) [Java] View plain copy export hive_home=.../pig-x.y.z export path= $PATH: $HIVE _home/bi N Hive Operation one or two ways: 1, non-interactive: Build Table:
Enter data:
Inquire:
2, Interactive: Build table:


Enter data:


Query 1:


Query 2:



Second, grammar analysis: (English part from Wikipedia Hive languagemanual DML) 1. Loading file import table (Loading files into tables) Hive does not does any transformation while loading data into tables. The Load operations are currently pure copy/move operations that move datafiles into locations to corresponding tables.

When data is loaded into a table, no conversion is made to the data. The Load operation simply copies/moves the data to the location of the Hive table.

Hive load data does not do any conversion the data loaded into the table simply moves into the corresponding hive table to move the data file. Pure load operation copy/move operation.

Grammar:

[SQL] view plain copy LOAD DATA [local] inpath ' filepath ' [OVERWRITE] into TABLE tablename [PARTITION (partcol1=v Al1, Partcol2=val2 ...)]

Analytical:

The Load operation is simply a copy/move operation that moves the data file to the location corresponding to the Hive table. FilePath can be: A relative path, such as a project/data1 absolute path, such as the complete URI of the/user/hive/project/data1 containing pattern, for example: hdfs://namenode:9000/user/hive/ The target of the project/data1 load can be a table or partition. If the table contains partitions, you must specify the partition name for each partition. FilePath can refer to a file (in which case, Hive moves the file to the table's directory) or a directory (in which case, Hive moves all the files in the directory to the table's corresponding directory). If local is specified, then: The load command will look for filepath in the native file system. If the discovery is a relative path, the path is interpreted as the current path to the current user. The user can also specify a full URI for the local file, such as: File:///user/hive/project/data1. The load command copies the files in the filepath to the destination file system. The target file system is determined by the location attribute of the table. The copied data file is moved to the location of the table's data. If the local keyword is not specified, this URI is used directly if FilePath is pointing to a complete uri,hive. Otherwise: If schema or authority,hive is not specified, the URI of Namenode is specified using the schema and Authority,fs.default.name defined in the Hadoop configuration file. If the path is not absolute, the Hive is interpreted relative to the/user/. Hive moves the contents of the file specified in filepath to the path specified by the table (or partition). If the OVERWRITE keyword is used, the content (if any) in the target table (or partition) is deleted, and then the contents of the file/directory to which FilePath points are added to the table/partition. If the target table (partition) already has a file and the file name conflicts with the filename in filepath, the existing file is replaced by the new file.NotesFilePath cannot contain subdirectories. If The keyword is not given, FilePath must refer to files within the same filesystem as the table ' s (or partition ' s) Location. Hive does some minimal checks to make sure that the files being match the target table. Currently it checks that if the table was stored in sequencefile format, the files being loaded, are also sequencefiles, and Vice versa. Please read compressedstorage If your datafile is compressed

Example:

Import data from local to table and append the original table:

[SQL] view plain copy LOAD DATA local inpath '/tmp/pv_2008-06-08_us.txt ' into TABLE C02 PARTITION (date= ' 2008-06- ', country= ' US ')

Import data from local to table and append records:

[SQL] view plain copy LOAD DATA local inpath './examples/files/kv1.txt ' into TABLE pokes; Import data from HDFs to a table and overwrite the original table:

[SQL] view plain copy LOAD DATA inpath '/user/admin/sqlldrdat/cnclickstat/20101101/18/clickstat_gp_fatdt0/0 ' Into

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.