Hive CREATE Index

Last Update:2016-01-31 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Indexes are standard database technologies that support indexing after the hive0.7 version. Hive provides limited indexing functionality, unlike the traditional relational database with the "key" concept, where users can create indexes on some columns to speed up certain operations, and index data created for one table is saved in another table. Hive's indexing function is now relatively late and offers fewer options. However, the index is designed to be controlled using the built-in pluggable Java code that allows users to extend this functionality to suit their needs. Not that all queries will benefit from the Hive Index, of course. Users can use the explain syntax to analyze whether the HIVEQL statement can use indexes to improve the performance of user queries. Like an index in an RDBMS, it is necessary to evaluate whether index creation is reasonable, after all, the index requires more disk space and there is a cost to creating a maintenance index. The user has to weigh the benefits and costs of getting the index.

Let's talk about how to create an index:

1. Create a table first:

[Java]View PlainCopy

Hive> CREATE TABLE User (id int, name string)
> ROW FORMAT Delimited
> Fields TERMINATED by ' \ t '
> STORED as Textfile;

2. Import data:

[Java]View PlainCopy

hive> Load Data local inpath '/export1/tmp/wyp/row.txt '
> overwrite into table user;

3. Test before creating an index:

[Java]View PlainCopy

Hive> SELECT * from user where id =500000;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there ' s no reduce operator
Cannot run job locally:input Size (= 356888890) is larger than
Hive.exec.mode.local.auto.inputbytes.max (= 134217728)
Starting Job = job_1384246387966_0247, Tracking URL =
http://l-datalogm1.data.cn1:9981/proxy/application_1384246387966_0247/
Kill Command=/home/q/hadoop/bin/hadoop Job-kill job_1384246387966_0247
Hadoop job information for stage-1:number of mappers:2; Number of reducers:0
2013-11- :336 stage-1 map = 0, reduce = 0% /c9>
2013-11- :stage-1 map=50%,reduce=0, Cumulative CPU 2.0 sec
2013-11- :xx,531 stage-1 map=100%,reduce=0, Cumulative CPU 5.63 sec
2013-11- :560 stage-1 map=100%,reduce=0, Cumulative CPU 5.63 sec /c10>
MapReduce Total Cumulative CPU time: 5 seconds 630 msec
Ended Job = job_1384246387966_0247
MapReduce Jobs Launched:
Job 0:map: 2 cumulative CPU: 5.63 sec
HDFs Read: 361084006 hdfs Write: 357 SUCCESS
Total MapReduce CPU time spent: 5 seconds 630 msec
Ok
500000 Wyp.
Time taken: 14.107 seconds, fetched: 1 row (s)

It took a total of 14.107s.

4. Create an index on user:

[Java]View PlainCopy

Hive> CREATE index User_index on table user (ID)
> as ' Org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler '
> with deferred rebuild
> in TABLE user_index_table;
Hive> ALTER index user_index on user rebuild;
Hive> select * from user_index_table limit 5;
0 HDFs://mycluster/user/hive/warehouse/table02/000000_0 [0]
1 HDFs://mycluster/user/hive/warehouse/table02/000000_0 [352]
2 HDFs://mycluster/user/hive/warehouse/table02/000000_0 [704]
3 HDFs://mycluster/user/hive/warehouse/table02/000000_0 [1056]
4 HDFs://mycluster/user/hive/warehouse/table02/000000_0 [1408]
Time taken: 0.244 seconds, fetched: 5 row (s)

This creates an index on the user table.

5. Test the user after the index is created:

[Java]View PlainCopy

Hive> SELECT * from user where id =500000;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there ' s no reduce operator
Cannot run job locally:input Size (= 356888890) is larger than
Hive.exec.mode.local.auto.inputbytes.max (= 134217728)
Starting Job = job_1384246387966_0247, Tracking URL =
http://l-datalogm1.data.cn1:9981/proxy/application_1384246387966_0247/
Kill Command=/home/q/hadoop/bin/hadoop Job-kill job_1384246387966_0247
Hadoop job information for stage-1:number of mappers:2; Number of reducers:0
2013-11- :336 stage-1 map = 0, reduce = 0%
2013-11- :stage-1 map=50%,reduce=0, Cumulative CPU 2.0 sec
2013-11-: xx,253 stage-1 map=100%,reduce=0, Cumulative CPU 5.27 sec /c10>
2013-11-: 650 stage-1 map=100%,reduce=0, Cumulative CPU 5.27 sec
MapReduce Total Cumulative CPU time: 5 seconds 630 msec
Ended Job = job_1384246387966_0247
MapReduce Jobs Launched:
Job 0:map: 2 cumulative CPU: 5.63 sec
HDFs Read: 361084006 hdfs Write: 357 SUCCESS
Total MapReduce CPU time spent: 5 seconds 630 msec
Ok
500000 Wyp.
Time taken: 13.042 seconds, fetched: 1 row (s)

The time is 13.042s, which is not very different from the effect of no index.

There is still a bug in creating the index in hive: If the schema information from the table is from Serde,hive, the index cannot be created:

[Java]View PlainCopy

hive> CREATE INDEX Employees_index
> On TABLE Employees (country)
> as ' Org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler '
> with DEFERRED REBUILD
> idxproperties (' creator ' = ' me ',' created_at ' = ' some_time ')
> in TABLE employees_index_table
> COMMENT ' Employees indexed by country and name. ';
Failed:error in metadata:java.lang.RuntimeException: \
Check the index columns, they should appear in the table being indexed.
Failed:execution Error, return code 1 from \
Org.apache.hadoop.hive.ql.exec.DDLTask

This bug occurred in Hive0.10.0, 0.10.1, 0.11.0, in Hive0.12.0 has been repaired, see for details: https://issues.apache.org/jira/browse/HIVE-4251

Article from past memory: http://www.iteblog.com/archives/836

Hive CREATE Index

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Hive CREATE Index

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Hive CREATE Index

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support