Linux Hadoop pseudo-distributed installation deployment detailed

Source: Internet
Author: User
Tags chmod mkdir create database hadoop fs log4j

What is Impala?

Cloudera released real-time query open source project Impala, according to a variety of products measured, it is more than the original based on MapReduce hive SQL query speed increase 3~90 times. Impala is an imitation of Google Dremel, but've seen wins blue on the SQL function.

1. Install JDK

The code is as follows Copy Code
$ sudo yum install jdk-6u41-linux-amd64.rpm

2. Pseudo-distributed mode installation CDH4

The code is as follows Copy Code
$ cd/etc/yum.repos.d/
$ sudo wget http://archive.cloudera.com/cdh4/redhat/6/x86_64/cdh/cloudera-cdh4.repo
$ sudo yum install Hadoop-conf-pseudo

Format Namenode.

The code is as follows Copy Code
$ sudo-u HDFs HDFs Namenode-format

Start HDFs

The code is as follows Copy Code
$ for x in ' Cd/etc/init.d; LS hadoop-hdfs-* '; Do sudo service $x start; Done

Create/tmp Directory

The code is as follows Copy Code
$ sudo-u HDFs Hadoop fs-rm-r/tmp
$ sudo-u HDFs Hadoop fs-mkdir/tmp
$ sudo-u HDFs Hadoop fs-chmod-r 1777/tmp

Create yarn and log directories

The code is as follows Copy Code

$ sudo-u HDFs Hadoop fs-mkdir/tmp/hadoop-yarn/staging
$ sudo-u HDFs Hadoop fs-chmod-r 1777/tmp/hadoop-yarn/staging

$ sudo-u HDFs Hadoop fs-mkdir/tmp/hadoop-yarn/staging/history/done_intermediate
$ sudo-u HDFs Hadoop fs-chmod-r 1777/tmp/hadoop-yarn/staging/history/done_intermediate

$ sudo-u HDFs Hadoop fs-chown-r mapred:mapred/tmp/hadoop-yarn/staging

$ sudo-u HDFs Hadoop Fs-mkdir/var/log/hadoop-yarn
$ sudo-u HDFs Hadoop fs-chown Yarn:mapred/var/log/hadoop-yarn

Check the HDFs file tree

The code is as follows Copy Code

$ sudo-u HDFs Hadoop fs-ls-r/

Drwxrwxrwt-hdfs SuperGroup 0 Info 15:31/tmp
Drwxr-xr-x-hdfs supergroup 0 Info 15:31/tmp/hadoop-yarn
drwxrwxrwt-mapred mapred 0 Info 15:31/tmp/hadoop-yarn/staging
Drwxr-xr-x-mapred mapred 0 Info 15:31/tmp/hadoop-yarn/staging/history
drwxrwxrwt-mapred mapred 0 Info 15:31/tmp/hadoop-yarn/staging/history/done_intermediate
Drwxr-xr-x-hdfs supergroup 0 Info 15:31/var
Drwxr-xr-x-hdfs supergroup 0 Info 15:31/var/log
Drwxr-xr-x-Yarn mapred 0 Info 15:31/var/log/hadoop-yarn

Start yarn

The code is as follows Copy Code
$ sudo service Hadoop-yarn-resourcemanager start
$ sudo service Hadoop-yarn-nodemanager start
$ sudo service hadoop-mapreduce-historyserver start

Create a user directory (for example, user Dong.guo):

The code is as follows Copy Code
$ sudo-u HDFs Hadoop Fs-mkdir/user/dong.guo
$ sudo-u HDFs Hadoop fs-chown Dong.guo/user/dong.guo

Test upload file

The code is as follows Copy Code

$ Hadoop Fs-mkdir Input
$ Hadoop Fs-put/etc/hadoop/conf/*.xml Input
$ Hadoop Fs-ls Input

Found 4 Items
-rw-r--r--1 Dong.guo supergroup 1461 2013-05-14 03:30 Input/core-site.xml
-rw-r--r--1 dong.guo supergroup 1854 2013-05-14 03:30 Input/hdfs-site.xml
-rw-r--r--1 dong.guo supergroup 1325 2013-05-14 03:30 Input/mapred-site.xml
-rw-r--r--1 dong.guo supergroup 2262 2013-05-14 03:30 Input/yarn-site.xml

Configuring Hadoop_mapred_home Environment variables

The code is as follows Copy Code
$ Export Hadoop_mapred_home=/usr/lib/hadoop-mapreduce

Run a Test job

The code is as follows Copy Code
$ Hadoop jar/usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar grep input output23 ' dfs[a-z.] +'

After the job completes, you can see the following directory

The code is as follows Copy Code

$ Hadoop fs-ls

Found 2 Items
drwxr-xr-x  -Dong.guo supergroup           0 2013-05-14 03:30 input
drwxr-xr-x  -Dong.guo supergroup     & nbsp;    0 2013-05-14 03:32 output23

$ hadoop fs-ls output23

Found 2 Items
-rw-r--r--&nb sp;  1 Dong.guo supergroup          0 2013-05-14 03:32 output23/_ SUCCESS
-rw-r--r--   1 Dong.guo supergroup        150 2013-05-14 03:32 output23/part-r-00000

$ hadoop fs-cat output23/part-r-00000 | head

1  Dfs.safemode.min.datanodes
1 dfs.safemode.extension
1 dfs.replication
1  Dfs.namenode.name.dir
1 dfs.namenode.checkpoint.dir
1 dfs.datanode.data.dir

3. Install Hive

The code is as follows Copy Code

$ sudo yum install hive Hive-metastore hive-server

$ sudo yum install Mysql-server

$ sudo service mysqld start

$ cd ~
$ Wget ' http://cdn.mysql.com/Downloads/Connector-J/mysql-connector-java-5.1.25.tar.gz '
$ tar xzf mysql-connector-java-5.1.25.tar.gz
$ sudo cp mysql-connector-java-5.1.25/mysql-connector-java-5.1.25-bin.jar/usr/lib/hive/lib/

$ sudo/usr/bin/mysql_secure_installation

[...]
Enter current password for root (enter for none):
OK, successfully used password, moving on ...
[...]
Set root Password? [y/n] Y
New password:hadoophive
Re-enter New password:hadoophive
Remove anonymous users? [y/n] Y
[...]
Disallow root login remotely? [y/n] N
[...]
Remove test database and access to it [y/n] Y
[...]
Reload privilege tables now? [y/n] Y
All done!

$ mysql-u root-phadoophive

mysql> CREATE DATABASE Metastore;
mysql> use Metastore;
Mysql> Source/usr/lib/hive/scripts/metastore/upgrade/mysql/hive-schema-0.10.0.mysql.sql;

mysql> CREATE USER ' hive ' @ '% ' identified by ' hadoophive ';
mysql> CREATE USER ' hive ' @ ' localhost ' identified by ' hadoophive ';
Mysql> REVOKE all privileges, GRANT OPTION from ' hive ' @ '% ';
Mysql> REVOKE all privileges, GRANT OPTION from ' hive ' @ ' localhost ';
Mysql> GRANT Select,insert,update,delete,lock Tables,execute on metastore.* to ' hive ' @ '% ';
Mysql> GRANT Select,insert,update,delete,lock Tables,execute on metastore.* to ' hive ' @ ' localhost ';
mysql> FLUSH privileges;
Mysql> quit;

$ sudo mv/etc/hive/conf/hive-site.xml/etc/hive/conf/hive-site.xml.bak
$ sudo vim/etc/hive/conf/hive-site.xml


Javax.jdo.option.ConnectionURL
Jdbc:mysql://localhost/metastore
The URL of the MySQL database

Javax.jdo.option.ConnectionDriverName
Com.mysql.jdbc.Driver

Javax.jdo.option.ConnectionUserName
Hive

Javax.jdo.option.ConnectionPassword
Hadoophive

Datanucleus.autocreateschema
False

Datanucleus.fixeddatastore
True

Hive.metastore.uris
thrift://127.0.0.1:9083
IP address (or fully-qualified domain name) and port of the Metastore host

Hive.aux.jars.path
file:///usr/lib/hive/lib/zookeeper.jar,file:///usr/lib/hive/lib/hbase.jar,file:///usr/lib/hive/lib/ Hive-hbase-handler-0.10.0-cdh4.2.0.jar,file:///usr/lib/hive/lib/guava-11.0.2.jar

$ sudo service hive-metastore start

Starting (Hive-metastore): [OK]

$ sudo service hive-server start

Starting (Hive-server): [OK]

$ sudo-u HDFs Hadoop fs-mkdir/user/hive
$ sudo-u HDFs Hadoop fs-chown hive/user/hive
$ sudo-u HDFs Hadoop fs-mkdir/tmp
$ sudo-u HDFs Hadoop fs-chmod 777/tmp
$ sudo-u HDFs Hadoop fs-chmod o+t/tmp
$ sudo-u HDFs Hadoop fs-mkdir/data
$ sudo-u HDFs Hadoop fs-chown hdfs/data
$ sudo-u HDFs Hadoop fs-chmod 777/data
$ sudo-u HDFs Hadoop fs-chmod o+t/data

$ sudo chown-r hive:hive/var/lib/hive
$ sudo vim/tmp/kv1.txt

1 www.baidu.com
2 www.google.com
3 www.sina.com.cn
4 www.163.com
5 heylinx.com

$ sudo-u Hive Hive

Logging initialized using configuration in File:/etc/hive/conf.dist/hive-log4j.properties
Hive History File=/tmp/root/hive_job_log_root_201305140801_825709760.txt

hive> CREATE TABLE IF not EXISTS pokes (foo int,bar STRING) ROW FORMAT delimited FIELDS terminated by "T" LINES Nated by "n";

Hive> Show tables;
Ok
Pokes
Time taken:0.415 seconds
hive> LOAD DATA local inpath '/tmp/kv1.txt ' OVERWRITE into TABLE pokes;
Copying data from File:/tmp/kv1.txt
Copying File:file:/tmp/kv1.txt
Loading Data to Table Default.pokes
Rmr:DEPRECATED:Please use ' rm-r ' instead.
Deleted/user/hive/warehouse/pokes
Table default.pokes Stats: [num_partitions:0, Num_files:1, num_rows:0, total_size:79, raw_data_size:0]
Ok
Time taken:1.681 seconds

$ Export Hadoop_mapred_home=/usr/lib/hadoop-mapreduce

4. Install Impala
$ cd/etc/yum.repos.d/
$ sudo wget http://archive.cloudera.com/impala/redhat/6/x86_64/impala/cloudera-impala.repo
$ sudo yum install Impala Impala-shell
$ sudo yum install impala-server Impala-state-store

$ sudo vim/etc/hadoop/conf/hdfs-site.xml

...

Dfs.client.read.shortcircuit
True


Dfs.domain.socket.path
/var/run/hadoop-hdfs/dn._port

Dfs.client.file-block-storage-locations.timeout
3000


Dfs.datanode.hdfs-blocks-metadata.enabled
True

$ sudo cp-rpa/etc/hadoop/conf/core-site.xml/etc/impala/conf/
$ sudo cp-rpa/etc/hadoop/conf/hdfs-site.xml/etc/impala/conf/

$ sudo service hadoop-hdfs-datanode restart

$ sudo service impala-state-store restart
$ sudo service impala-server restart

$ sudo/usr/java/default/bin/jps

5. Install Hbase

The code is as follows Copy Code

$ sudo yum install HBase

$ sudo vim/etc/security/limits.conf

Hdfs-nofile 32768
Hbase-nofile 32768

$ sudo vim/etc/pam.d/common-session

Session Required Pam_limits.so

$ sudo vim/etc/hadoop/conf/hdfs-site.xml


Dfs.datanode.max.xcievers
4096

$ sudo cp/usr/lib/impala/lib/hive-hbase-handler-0.10.0-cdh4.2.0.jar/usr/lib/hive/lib/ Hive-hbase-handler-0.10.0-cdh4.2.0.jar

$ sudo/etc/init.d/hadoop-hdfs-namenode Restart
$ sudo/etc/init.d/hadoop-hdfs-datanode Restart

$ sudo yum install Hbase-master
$ sudo service hbase-master start

$ sudo-u Hive Hive

Logging initialized using configuration in File:/etc/hive/conf.dist/hive-log4j.properties
Hive History File=/tmp/hive/hive_job_log_hive_201305140905_2005531704.txt
Hive> CREATE TABLE hbase_table_1 (key int, value string) STORED by ' Org.apache.hadoop.hive.hbase.HBaseStorageHandler ' With Serdeproperties ("hbase.columns.mapping" = ": Key,cf1:val") tblproperties ("hbase.table.name" = "xyz");
Ok
Time taken:3.587 seconds

hive> INSERT OVERWRITE TABLE hbase_table_1 SELECT * from pokes WHERE foo=5;


Total MapReduce jobs = 1


Launching Job 1 out of 1


Number of reduce tasks are set to 0 since there ' s no reduce operator


Starting Job = JOB_1368502088579_0004, tracking URL = http://ip-10-197-10-4:8088/proxy/application_1368502088579_0004/


Kill Command =/usr/lib/hadoop/bin/hadoop Job-kill job_1368502088579_0004


Hadoop Job information for Stage-0: number of mappers:1; Number of reducers:0


2013-05-14 09:12:45,340 Stage-0 map = 0, reduce = 0%


2013-05-14 09:12:53,165 Stage-0 map = 100%, reduce = 0%, Cumulative CPU 2.63 sec


MapReduce Total cumulative CPU time:2 seconds 630 msec


Ended Job = job_1368502088579_0004


1 Rows loaded to Hbase_table_1


MapReduce Jobs Launched:


Job 0:map:1 Cumulative cpu:2.63 sec HDFS read:288 HDFS write:0 SUCCESS


Total MapReduce CPU time spent:2 seconds 630 msec


Ok


Time taken:21.063 seconds

Hive> select * from Hbase_table_1;
Ok
5 heylinx.com
Time taken:0.685 seconds

Hive> SELECT COUNT (*) from pokes;


Total MapReduce jobs = 1


Launching Job 1 out of 1


Number of reduce tasks determined at compile Time:1


In average load for a reducer (in bytes):


Set hive.exec.reducers.bytes.per.reducer=<number>


In order to limit the maximum number of reducers:


Set hive.exec.reducers.max=<number>


In order to set a constant number of reducers:


Set mapred.reduce.tasks=<number>


Starting Job = JOB_1368502088579_0005, tracking URL = http://ip-10-197-10-4:8088/proxy/application_1368502088579_0005/


Kill Command =/usr/lib/hadoop/bin/hadoop Job-kill job_1368502088579_0005


Hadoop Job information for Stage-1: number of mappers:1; Number of Reducers:1


2013-05-14 10:32:04,711 Stage-1 map = 0, reduce = 0%


2013-05-14 10:32:11,461 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.22 sec


2013-05-14 10:32:12,554 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.22 sec


2013-05-14 10:32:13,642 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.22 sec


2013-05-14 10:32:14,760 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.22 sec


2013-05-14 10:32:15,918 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.22 sec


2013-05-14 10:32:16,991 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.22 sec


2013-05-14 10:32:18,111 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.22 sec


2013-05-14 10:32:19,188 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 4.04 sec


MapReduce Total cumulative CPU time:4 seconds msec


Ended Job = job_1368502088579_0005


MapReduce Jobs Launched:


Job 0:map:1 Reduce:1 Cumulative cpu:4.04 sec HDFS read:288 HDFS write:2 SUCCESS


Total MapReduce CPU time spent:4 seconds msec


Ok


5


Time taken:28.195 seconds

6. Test Impala Performance

The code is as follows Copy Code

View Parameters on http://ec2-204-236-182-78.us-west-1.compute.amazonaws.com:25000

$ impala-shell

[ip-10-197-10-4.us-west-1.compute.internal:21000] > CREATE TABLE IF not EXISTS pokes (foo int,bar STRING) ROW FORMAT Delimited FIELDS terminated by "T" LINES terminated by "n";
Query:create TABLE IF not EXISTS pokes (foo int,bar STRING) ROW FORMAT delimited FIELDS terminated by "T" LINES TED by "n"
[ip-10-197-10-4.us-west-1.compute.internal:21000] > Show tables;
Query:show tables
Query finished, fetching results ...
+-------+
| name |
+-------+
| Pokes |
+-------+
Returned 1 row (s) in 0.00s

[ip-10-197-10-4.us-west-1.compute.internal:21000] > SELECT * from pokes;
Query:select * from pokes
Query finished, fetching results ...
+-----+-----------------+
| Foo | Bar |
+-----+-----------------+
| 1 | www.111cn.net |
| 2 | www.111cn.net |
| 3 | mb.111cn.net |
| 4 | www.111cn.net |
| 5 | baidu.com |
+-----+-----------------+
Returned 5 row (s) in 0.28s

[ip-10-197-10-4.us-west-1.compute.internal:21000] > SELECT COUNT (*) from pokes;
Query:select COUNT (*) from pokes
Query finished, fetching results ...
+----------+
| COUNT (*) |
+----------+
| 5 |
+----------+
Returned 1 row (s) in 0.34s

With the result of two count, Hive uses the 28.195 seconds and Impala uses only 0.34s, which shows that Impala is really better than hive in performance.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.