Install HIVE in Ubuntu10.10

Last Update:2014-06-09 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

No problem was found during configuration, and then the HIVE was used to run SQL and run the program that matches the first Map/Reduce. 1. create table 1 createtablepackCount (userinfoSTRING, udidSTRING, ipSTRING, netSTRING, nothSTRING... no problem was found during configuration, and then the HIVE was used to run SQL and run the program that matches the first Map/Reduce.

1. create a table

1
Create table packCount (userinfo STRING, udid STRING, ip STRING, net STRING, noth STRING, no2string, noth3 STRING, phone STRING, num STRING, city STRING, pack STRING, numm STRING, downtime STRING) row format delimited fields terminated ',';
2. Upload data
1
Load data local inpath '/home/wjk/hadoop/cdh3/packin/active_log2012020101.txt' overwrite into table packCount;
3. write SQL
1
Select pack, COUNT (udid) as PV, COUNT (distinct (udid) as UV from packCount where udid! = ''And udid! = 'Null' group by pack;
4. the original data has 11 fields, and only 9 fields are created for the first table creation. Need to delete and recreate
First use: dfs-rmr/user/hive/warehouse/packcount;
However, when creating a table again, the system always prompts that the table packCount already exists. later, we thought that the table was not deleted in Hive. we should run drop table packCount in hive>;

Reference: http://www.hadoopor.com/thread-409-1-1.html

======= Legacy issues ========

1. pay attention to the load execution process. The last delete from hdfs does not know how to understand it.

1
HIVE load data execution process:
2
Hive> load data local inpath'/home/wjk/hadoop/cdh3/packin/active_log-2012020101.txt 'overwrite into table packCount;
3
Copying data from file:/home/wjk/hadoop/cdh3/packin/active_log2012020101.txt
4
Copying file:/home/wjk/hadoop/cdh3/packin/active_log2012020101.txt
5
Loading data to table default. packcount
6
Deleted hdfs: // localhost: 9000/user/hive/warehouse/packcount
7
OK
8
Time taken: 2.611 seconds
The document is uploaded locally to the Hive data warehouse. on the 381 page of the authoritative guide (Chinese version), the following message is displayed: load is to upload the file from hdfs ://....... move to the packCount table directory in the Hive warehouse Directory, that is, hdfs: // user/hive/warehouse/packCount, "Only the source and target files can be moved in the same file system." It is understood that the files are uploaded from the local disk to hdfs and then moved to the Hive data warehouse, while hdfs is only used as a transfer station, and all of them finally have a delete. what do you think is wrong ??? Help me explain

"Deleting a table in hive does not delete the data, because the data itself is in dfs, it only loads in, and the metadata only records the relationship between the table and the data ", in this case, how can we explain this delete ?? Confused

============================ 2.24 =========================

#1. "the overwrite keyword in the Load statement tells Hive to delete all existing files in the directory corresponding to the table. if this keyword is omitted, hive simply adds the new file to the Directory. "I suspect that the last deletion is caused by overwrite, and then conduct the experiment.

#2. first, run drop table packCount in Hive, and then run dfs-rmr/user/hive/warehouse/packcount. the console reports an error, indicating "No such file or directory. "This directory does not exist in dfs.
Cause: because I have been using managed tables, the metadata and data will be deleted along with the Drop operation. (If it is an external table, the table declared by external will only delete the metadata)

#3. re-create the table and load the data. this time, the overwrite keyword is removed.
1
Hive> load data local inpath'/home/wjk/hadoop/cdh3/packin/active_log-2012020101.txt 'into table packCount;
2
Copying data from file:/home/wjk/hadoop/cdh3/packin/active_log-2012020101.txt
3
Copying file:/home/wjk/hadoop/cdh3/packin/active_log-2012020101.txt
4
Loading data to table default. packcount
5
OK
6
Time taken: 40.711 seconds
The delete! It was originally caused by overwrite.

#4. sort out the ideas and you may understand this: when load is executed to load data to table, it will make a judgment. if the overwrite keyword is used, you must first delete all the files in the corresponding directory, there will be a Deleted hdfs: // localhost: 9000/user/hive/warehouse/packcount operation, after the delete operation, put the data in dfs, and then load it to Hive. Hive is the form of metadata. In fact, data is stored in dfs. Finally, the drop operation will delete the managed table or external table.

####: Personal ignorance. if not, I hope your predecessors can correct me more. thank you.

From Wang Jiankui Jerrick's blog

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Install HIVE in Ubuntu10.10

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Install HIVE in Ubuntu10.10

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support