Query of massive data based on hadoop+hive architecture

Source: Internet
Author: User
Tags chmod mkdir ssh hadoop fs

References: https://cwiki.apache.org/confluence/display/Hive/GettingStarted

1. Install Hadoop and start. Reference resources:

single-node:http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/

multi-node:http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/

Note: If you build Multi-node,

Requires no password for SSH communication during service

Ssh-keygen-t rsa-p ""

Cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

Ssh-copy-id-i $HOME/.ssh/id_rsa.pub root@slave

If encountered:Error:java_home is not set:

http://hi.baidu.com/tdfrank/blog/item/fa55a597b26c197e55fb969b.html


Furthermore

A The Slaver profile (Core-site.xml and Mapred-site.xml) needs to point to master.

b The Safe mode can be turned off via Hadoop dfsadmin-safemode leave instructions.

c) Incompatible namespaceids problem can be solved by modifying Namespaceid in Datanode (the workaround 2 method in the tutorial appears to be feasible)


2. Install hive.

$ TAR-XZVF hive-x.y.z.tar.gz

$ CD hive-x.y.z

$ Export Hive_home={{pwd}}

$ export path= $HIVE _home/bin: $PATH


3. Create Hive folder in HDFs

$ $HADOOP _home/bin/hadoop fs-mkdir/tmp
$ $HADOOP _home/bin/hadoop Fs-mkdir/user/hive/warehouse
$ $HADOOP _home/bin/hadoop fs-chmod g+w/tmp
$ $HADOOP _home/bin/hadoop fs-chmod G+w/user/hive/warehouse


4. Start Hive

$ Export Hive_home=

$ $HIVE _home/bin/hive


5. Use

Let us illustrate with the example of WordCount.

Create a table after entering hive

DROP TABLE words;
CREATE TABLE Words
(
Word STRING,
Count INT
) ROW FORMAT Delimited
FIELDS terminated by ' \ t '
LINES terminated by ' \ n '
STORED as Textfile;

To import map/reduce running data:

LOAD DATA inpath '/user/hduser/bin-output/part-r-00000 ' into TABLE words;

Enter HQL for query:

SELECT * from words where word is like ' zoo% ';


Also attached hive and HBase related articles:

Http://simpleframework.net/blog/v/13164.html

http://jiedushi.blog.51cto.com/673653/608371


Troubleshoot the latest Hive0.8 incompatible hadoop1.0 issues:

https://issues.apache.org/jira/browse/HIVE-2631

http://svn.apache.org/viewvc?view=revision&revision=1215279


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.