Open source Cloud Computing Technology Series (iv) (Cloudera installation configuration)

Source: Internet
Author: User
Keywords Nbsp; http name java nbsp; http name java

Save space, straight to the point.

First, use the virtual machine VirtualBox to configure a Debian 5.0.

Debian is always the most pure Linux pedigree in open source Linux, easy to use, efficient to run, and a new look at the latest 5.0, and don't feel like the last one.

Only need to download Debian-501-i386-cd-1.iso to install, the remaining based on the Debian Strong network features, can be very convenient for the package configuration. The specific process is omitted here, you can find all the information you need in www.debian.org.

Let's experience the ease and simplicity of the stable version 0.183.

Step1. Configure Cloudera Repository

Create a new configuration file vi/etc/apt/sources.list.d/cloudera.list

More/etc/apt/sources.list.d/cloudera.list
Deb Http://archive.cloudera.com/debian Lenny Contrib
DEB-SRC Http://archive.cloudera.com/debian Lenny Contrib

Add adding the Cloudera Key

debian:~# Curl-s Http://archive.cloudera.com/debian/archive.key | Apt-key Add-
OK

Update APT Index

debian:~# apt-get Update


Ign Cdrom://[debian gnu/linux 5.0.1 _lenny_-Official i386 CD Binary-1 20090413-00:10] Lenny release.gpg


IGN Cdrom://[debian Gnu/linux 5.0.1 _lenny_-Official i386 CD Binary-1 20090413-00:10] Lenny/main translation-en_us


Ign Cdrom://[debian gnu/linux 5.0.1 _lenny_-Official i386 CD Binary-1 20090413-00:10] Lenny release


Ign Cdrom://[debian gnu/linux 5.0.1 _lenny_-Official i386 CD Binary-1 20090413-00:10] Lenny/main packages/diffindex


get:1 http://archive.cloudera.com Lenny RELEASE.GPG [197B]


Get:2 http://volatile.debian.org lenny/volatile release.gpg [189B]


Ign http://volatile.debian.org lenny/volatile/main translation-en_us


Hit http://ftp.us.debian.org Lenny RELEASE.GPG


IGN http://archive.cloudera.com lenny/contrib Translation-en_us


Hit http://security.debian.org lenny/updates release.gpg


Ign http://security.debian.org lenny/updates/main translation-en_us


get:3 http://volatile.debian.org lenny/volatile release [40.7kB]


IGN http://ftp.us.debian.org lenny/main Translation-en_us


Hit http://security.debian.org lenny/updates release


get:4 http://archive.cloudera.com Lenny release [2391B]


Hit http://ftp.us.debian.org Lenny Release


Ign http://security.debian.org lenny/updates/main packages/diffindex


Ign http://archive.cloudera.com lenny/contrib Packages


Ign http://security.debian.org lenny/updates/main sources/diffindex


IGN http://ftp.us.debian.org lenny/main Packages/diffindex


Ign http://ftp.us.debian.org lenny/main sources/diffindex


Hit http://security.debian.org lenny/updates/main Packages


Hit http://ftp.us.debian.org lenny/main Packages


Ign http://archive.cloudera.com lenny/contrib Sources


IGN http://volatile.debian.org lenny/volatile/main Packages/diffindex


Hit http://security.debian.org lenny/updates/main Sources


Ign http://volatile.debian.org lenny/volatile/main sources/diffindex


Hit http://ftp.us.debian.org lenny/main Sources


get:5 http://archive.cloudera.com lenny/contrib Packages [4480B]


get:6 http://volatile.debian.org lenny/volatile/main Packages [7471B]


get:7 http://volatile.debian.org lenny/volatile/main Sources [2350B]


get:8 http://archive.cloudera.com lenny/contrib Sources [1431B]


fetched 59.2kB in 4s (12.5kb/s)


Reading Package Lists ... Done


debian:~#

View Cloudera Packages

debian:~# Apt-cache Search Hadoop
HADOOP-A software platform for processing vast amounts of data
hadoop-conf-pseudo-pseudo-distributed Hadoop Configuration
Hadoop-datanode-data Node for Hadoop
Hadoop-doc-documentation for Hadoop
Hadoop-jobtracker-job Tracker for Hadoop
Hadoop-namenode-name Node for Hadoop
Hadoop-native-native libraries for Hadoop (e.g., compression)
Hadoop-pipes-interface to author Hadoop MapReduce jobs in C + +
Hadoop-secondarynamenode-secondary Name Node for Hadoop
Hadoop-tasktracker-task Tracker for Hadoop
HIVE-A Data Warehouse infrastructure built on top of Hadoop
LIBHDFS0-JNI bindings to access Hadoop HDFS from C
PIG-A platform for analyzing large data sets using Hadoop
debian:~#

OK, ready to work here, the following start the formal installation, or very convenient.

We chose the mode of installing Hadoop (pseudo-distributed mode). The ability to fully experience Hadoop.

Yesterday we experienced Hadoop-conf-pseudo 0.18.3-0cloudera0.3.0~intrepid, today released the Cloudera software trial package based on the latest version of Hadoop 0.20. This is the speed of open source software, a new feeling every day.

Need JAVA6.

Configuration

debian:~/codeblue2/client/examples# more/etc/apt/sources.list
#
# deb Cdrom:[debian gnu/linux 5.0.1 _lenny_-Official i386 CD Binary-1 20090413-00:10]/Lenny Main

Deb Cdrom:[debian gnu/linux 5.0.1 _lenny_-Official i386 CD Binary-1 20090413-00:10]/Lenny Main

Deb Http://ftp.us.debian.org/debian/lenny main contrib Non-free
DEB-SRC Http://ftp.us.debian.org/debian/lenny Main contrib Non-free

Deb Http://security.debian.org/lenny/updates main contrib Non-free
DEB-SRC http://security.debian.org/lenny/updates Main contrib Non-free

Deb Http://volatile.debian.org/debian-volatile lenny/volatile main contrib Non-free
DEB-SRC http://volatile.debian.org/debian-volatile lenny/volatile Main contrib Non-free

Then Apt-get update.

debian:~# Apt-get Install Sun-java6-jre

Very stupid to install it, here is omitted output.

Before experiencing 0.20, in the 0.18.3 installation, after all, is a stable version.

apt-get-y Install Hadoop-conf-pseudo


Reading Package Lists ... Done


Building Dependency Tree


Reading State information ... Done


the following extra packages'll be installed:


Hadoop hadoop-native liblzo2-2


The following NEW packages would be installed:


Hadoop hadoop-conf-pseudo hadoop-native liblzo2-2


0 upgraded, 4 newly installed, 0 to remove and 0 not upgraded.


Need to get 12.0MB/12.1MB of archives.


after this operation, 21.5MB of additional disk space would be used.


get:1 http://archive.cloudera.com lenny/contrib Hadoop 0.18.3-4cloudera0.3.0~lenny [11.9MB]


Get:2 http://archive.cloudera.com lenny/contrib Hadoop-conf-pseudo 0.18.3-4cloudera0.3.0~lenny [93.1kB]


get:3 http://archive.cloudera.com lenny/contrib hadoop-native 0.18.3-4cloudera0.3.0~lenny [92.7kB]


fetched 4336kB in 23s (184kb/s)


selecting previously deselected package liblzo2-2.


(Reading database ...) 103556 files and directories currently installed.)


unpacking liblzo2-2 (from .../lzo2/liblzo2-2_2.03-1_i386.deb) ...


Selecting previously deselected package Hadoop.


unpacking Hadoop (from .../hadoop_0.18.3-4cloudera0.3.0~lenny_all.deb) ...


selecting previously deselected package Hadoop-conf-pseudo.


unpacking Hadoop-conf-pseudo (from .../hadoop-conf-pseudo_0.18.3-4cloudera0.3.0~lenny_all.deb) ...


selecting previously deselected package hadoop-native.


unpacking hadoop-native (from .../hadoop-native_0.18.3-4cloudera0.3.0~lenny_i386.deb) ...


processing triggers for man-db ...


Setting up liblzo2-2 (2.03-1) ...


Setting up Hadoop (0.18.3-4cloudera0.3.0~lenny) ...


Setting up Hadoop-conf-pseudo (0.18.3-4cloudera0.3.0~lenny) ...


Setting up hadoop-native (0.18.3-4cloudera0.3.0~lenny) ...

See where the installation is.

debian:~# dpkg-l Hadoop-conf-pseudo
/.
/etc
/etc/hadoop
/etc/hadoop/conf.pseudo
/etc/hadoop/conf.pseudo/hadoop-default.xml
/etc/hadoop/conf.pseudo/configuration.xsl
/etc/hadoop/conf.pseudo/log4j.properties
/etc/hadoop/conf.pseudo/slaves
/etc/hadoop/conf.pseudo/sslinfo.xml.example
/etc/hadoop/conf.pseudo/hadoop-env.sh
/etc/hadoop/conf.pseudo/masters
/etc/hadoop/conf.pseudo/hadoop-metrics.properties
/etc/hadoop/conf.pseudo/commons-logging.properties
/etc/hadoop/conf.pseudo/hadoop-site.xml
/usr
/usr/share
/usr/share/doc
/usr/share/doc/hadoop-conf-pseudo
/usr/share/doc/hadoop-conf-pseudo/copyright
/usr/share/doc/hadoop-conf-pseudo/changelog. Debian.gz
/usr/share/doc/hadoop-conf-pseudo/changelog.gz
/usr/share/lintian
/usr/share/lintian/overrides
/usr/share/lintian/overrides/hadoop-conf-pseudo

debian:~# ls-l/var/lib/hadoop/cache/hadoop/dfs/name
Total 8
Drwxr-xr-x 2 Hadoop hadoop 4096 2009-06-24 02:58 current
Drwxr-xr-x 2 Hadoop hadoop 4096 2009-06-24 02:58 image

Start Hadoop Services:

debian:~#/etc/init.d/hadoop-namenode Start
Starting Hadoop Namenode daemon:starting Namenode, logging to/var/log/hadoop/hadoop-hadoop-namenode-debian.out
Hadoop-namenode.

/etc/init.d/hadoop-datanode start
Starting Hadoop Datanode daemon:starting Datanode, logging to/var/log/hadoop/hadoop-hadoop-datanode-debian.out
Hadoop-datanode.
debian:~#/etc/init.d/hadoop-jobtracker Start
Starting Hadoop Jobtracker daemon:starting Jobtracker, logging to/var/log/hadoop/hadoop-hadoop-jobtracker-debian.out

Hadoop-jobtracker.

See if the process is normal

Hadoop 7926 1 0 03:01? 00:00:12/usr/lib/jvm/java-6-sun//bin/java-xmx100m-dcom.sun.man
Hadoop 8007 1 1 03:02? 00:00:14/usr/lib/jvm/java-6-sun//bin/java-xmx100m-dcom.sun.man
Hadoop 8053 1 0 03:02? 00:00:13/usr/lib/jvm/java-6-sun//bin/java-xmx100m-dcom.sun.man
Hadoop 8108 1 0 03:02? 00:00:11/usr/lib/jvm/java-6-sun//bin/java-xmx100m-dhadoop.log

Hive and pig installation is also a command to handle, convenient and affordable.

Apt-get Install Hive

Apt-get Insall Pig

OK, we autoremove off 0.183 and experience the latest 0.20

debian:~# Apt-get Autoremove Hadoop-conf-pseudo

debian:~# wget http://archive.cloudera.com/hadoop-summit-09/hadoop-20-debs/deb_lenny_i386/hadoop-0.20_ 0.20.0-1cloudera0.5.0~lenny_all.deb

debian:~# dpkg-i Hadoop-0.20_0.20.0-1cloudera0.5.0~lenny_all.deb
Selecting previously deselected package hadoop-0.20.
(Reading database ...) 103589 files and directories currently installed.)
Unpacking hadoop-0.20 (from Hadoop-0.20_0.20.0-1cloudera0.5.0~lenny_all.deb) ...
Setting up hadoop-0.20 (0.20.0-1cloudera0.5.0~lenny) ...
Processing triggers for man-db ...

About the new progress of 0.20.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.