Install HIVE in Ubuntu10.10

Source: Internet
Author: User
No problem was found during configuration, and then the HIVE was used to run SQL and run the program that matches the first Map/Reduce. 1. create table 1 createtablepackCount (userinfoSTRING, udidSTRING, ipSTRING, netSTRING, nothSTRING... no problem was found during configuration, and then the HIVE was used to run SQL and run the program that matches the first Map/Reduce.
 
 
1. create a table
 
1
Create table packCount (userinfo STRING, udid STRING, ip STRING, net STRING, noth STRING, no2string, noth3 STRING, phone STRING, num STRING, city STRING, pack STRING, numm STRING, downtime STRING) row format delimited fields terminated ',';
2. Upload data
1
Load data local inpath '/home/wjk/hadoop/cdh3/packin/active_log2012020101.txt' overwrite into table packCount;
3. write SQL
1
Select pack, COUNT (udid) as PV, COUNT (distinct (udid) as UV from packCount where udid! = ''And udid! = 'Null' group by pack;
4. the original data has 11 fields, and only 9 fields are created for the first table creation. Need to delete and recreate
First use: dfs-rmr/user/hive/warehouse/packcount;
However, when creating a table again, the system always prompts that the table packCount already exists. later, we thought that the table was not deleted in Hive. we should run drop table packCount in hive>;
 
Reference: http://www.hadoopor.com/thread-409-1-1.html
 
======= Legacy issues ========
 
1. pay attention to the load execution process. The last delete from hdfs does not know how to understand it.
 
1
HIVE load data execution process:
2
Hive> load data local inpath'/home/wjk/hadoop/cdh3/packin/active_log-2012020101.txt 'overwrite into table packCount;
3
Copying data from file:/home/wjk/hadoop/cdh3/packin/active_log2012020101.txt
4
Copying file:/home/wjk/hadoop/cdh3/packin/active_log2012020101.txt
5
Loading data to table default. packcount
6
Deleted hdfs: // localhost: 9000/user/hive/warehouse/packcount
7
OK
8
Time taken: 2.611 seconds
The document is uploaded locally to the Hive data warehouse. on the 381 page of the authoritative guide (Chinese version), the following message is displayed: load is to upload the file from hdfs ://....... move to the packCount table directory in the Hive warehouse Directory, that is, hdfs: // user/hive/warehouse/packCount, "Only the source and target files can be moved in the same file system." It is understood that the files are uploaded from the local disk to hdfs and then moved to the Hive data warehouse, while hdfs is only used as a transfer station, and all of them finally have a delete. what do you think is wrong ??? Help me explain
 
"Deleting a table in hive does not delete the data, because the data itself is in dfs, it only loads in, and the metadata only records the relationship between the table and the data ", in this case, how can we explain this delete ?? Confused
 
============================ 2.24 =========================
 
#1. "the overwrite keyword in the Load statement tells Hive to delete all existing files in the directory corresponding to the table. if this keyword is omitted, hive simply adds the new file to the Directory. "I suspect that the last deletion is caused by overwrite, and then conduct the experiment.
 
 
#2. first, run drop table packCount in Hive, and then run dfs-rmr/user/hive/warehouse/packcount. the console reports an error, indicating "No such file or directory. "This directory does not exist in dfs.
Cause: because I have been using managed tables, the metadata and data will be deleted along with the Drop operation. (If it is an external table, the table declared by external will only delete the metadata)
 
 
#3. re-create the table and load the data. this time, the overwrite keyword is removed.
1
Hive> load data local inpath'/home/wjk/hadoop/cdh3/packin/active_log-2012020101.txt 'into table packCount;
2
Copying data from file:/home/wjk/hadoop/cdh3/packin/active_log-2012020101.txt
3
Copying file:/home/wjk/hadoop/cdh3/packin/active_log-2012020101.txt
4
Loading data to table default. packcount
5
OK
6
Time taken: 40.711 seconds
The delete! It was originally caused by overwrite.
 
 
 
#4. sort out the ideas and you may understand this: when load is executed to load data to table, it will make a judgment. if the overwrite keyword is used, you must first delete all the files in the corresponding directory, there will be a Deleted hdfs: // localhost: 9000/user/hive/warehouse/packcount operation, after the delete operation, put the data in dfs, and then load it to Hive. Hive is the form of metadata. In fact, data is stored in dfs. Finally, the drop operation will delete the managed table or external table.
 
####: Personal ignorance. if not, I hope your predecessors can correct me more. thank you.

From Wang Jiankui Jerrick's blog
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.