Save space, straight to the point.
First, use the virtual machine VirtualBox to configure a Debian 5.0.
Debian is always the most pure Linux pedigree in open source Linux, easy to use, efficient to run, and a new look at the latest 5.0, and don't feel like the last one.
Only need to download Debian-501-i386-cd-1.iso to install, the remaining based on the Debian Strong network features, can be very convenient for the package configuration. The specific process is omitted here, you can find all the information you need in www.debian.org.
Let's experience the ease and simplicity of the stable version 0.183.
Step1. Configure Cloudera Repository
Create a new configuration file vi/etc/apt/sources.list.d/cloudera.list
More/etc/apt/sources.list.d/cloudera.list
Deb Http://archive.cloudera.com/debian Lenny Contrib
DEB-SRC Http://archive.cloudera.com/debian Lenny Contrib
Add adding the Cloudera Key
debian:~# Curl-s Http://archive.cloudera.com/debian/archive.key | Apt-key Add-
OK
Update APT Index
debian:~# apt-get Update
Ign Cdrom://[debian gnu/linux 5.0.1 _lenny_-Official i386 CD Binary-1 20090413-00:10] Lenny release.gpg
IGN Cdrom://[debian Gnu/linux 5.0.1 _lenny_-Official i386 CD Binary-1 20090413-00:10] Lenny/main translation-en_us
Ign Cdrom://[debian gnu/linux 5.0.1 _lenny_-Official i386 CD Binary-1 20090413-00:10] Lenny release
Ign Cdrom://[debian gnu/linux 5.0.1 _lenny_-Official i386 CD Binary-1 20090413-00:10] Lenny/main packages/diffindex
get:1 http://archive.cloudera.com Lenny RELEASE.GPG [197B]
Get:2 http://volatile.debian.org lenny/volatile release.gpg [189B]
Ign http://volatile.debian.org lenny/volatile/main translation-en_us
Hit http://ftp.us.debian.org Lenny RELEASE.GPG
IGN http://archive.cloudera.com lenny/contrib Translation-en_us
Hit http://security.debian.org lenny/updates release.gpg
Ign http://security.debian.org lenny/updates/main translation-en_us
get:3 http://volatile.debian.org lenny/volatile release [40.7kB]
IGN http://ftp.us.debian.org lenny/main Translation-en_us
Hit http://security.debian.org lenny/updates release
get:4 http://archive.cloudera.com Lenny release [2391B]
Hit http://ftp.us.debian.org Lenny Release
Ign http://security.debian.org lenny/updates/main packages/diffindex
Ign http://archive.cloudera.com lenny/contrib Packages
Ign http://security.debian.org lenny/updates/main sources/diffindex
IGN http://ftp.us.debian.org lenny/main Packages/diffindex
Ign http://ftp.us.debian.org lenny/main sources/diffindex
Hit http://security.debian.org lenny/updates/main Packages
Hit http://ftp.us.debian.org lenny/main Packages
Ign http://archive.cloudera.com lenny/contrib Sources
IGN http://volatile.debian.org lenny/volatile/main Packages/diffindex
Hit http://security.debian.org lenny/updates/main Sources
Ign http://volatile.debian.org lenny/volatile/main sources/diffindex
Hit http://ftp.us.debian.org lenny/main Sources
get:5 http://archive.cloudera.com lenny/contrib Packages [4480B]
get:6 http://volatile.debian.org lenny/volatile/main Packages [7471B]
get:7 http://volatile.debian.org lenny/volatile/main Sources [2350B]
get:8 http://archive.cloudera.com lenny/contrib Sources [1431B]
fetched 59.2kB in 4s (12.5kb/s)
Reading Package Lists ... Done
debian:~#
View Cloudera Packages
debian:~# Apt-cache Search Hadoop
HADOOP-A software platform for processing vast amounts of data
hadoop-conf-pseudo-pseudo-distributed Hadoop Configuration
Hadoop-datanode-data Node for Hadoop
Hadoop-doc-documentation for Hadoop
Hadoop-jobtracker-job Tracker for Hadoop
Hadoop-namenode-name Node for Hadoop
Hadoop-native-native libraries for Hadoop (e.g., compression)
Hadoop-pipes-interface to author Hadoop MapReduce jobs in C + +
Hadoop-secondarynamenode-secondary Name Node for Hadoop
Hadoop-tasktracker-task Tracker for Hadoop
HIVE-A Data Warehouse infrastructure built on top of Hadoop
LIBHDFS0-JNI bindings to access Hadoop HDFS from C
PIG-A platform for analyzing large data sets using Hadoop
debian:~#
OK, ready to work here, the following start the formal installation, or very convenient.
We chose the mode of installing Hadoop (pseudo-distributed mode). The ability to fully experience Hadoop.
Yesterday we experienced Hadoop-conf-pseudo 0.18.3-0cloudera0.3.0~intrepid, today released the Cloudera software trial package based on the latest version of Hadoop 0.20. This is the speed of open source software, a new feeling every day.
Need JAVA6.
Configuration
debian:~/codeblue2/client/examples# more/etc/apt/sources.list
#
# deb Cdrom:[debian gnu/linux 5.0.1 _lenny_-Official i386 CD Binary-1 20090413-00:10]/Lenny Main
Deb Cdrom:[debian gnu/linux 5.0.1 _lenny_-Official i386 CD Binary-1 20090413-00:10]/Lenny Main
Deb Http://ftp.us.debian.org/debian/lenny main contrib Non-free
DEB-SRC Http://ftp.us.debian.org/debian/lenny Main contrib Non-free
Deb Http://security.debian.org/lenny/updates main contrib Non-free
DEB-SRC http://security.debian.org/lenny/updates Main contrib Non-free
Deb Http://volatile.debian.org/debian-volatile lenny/volatile main contrib Non-free
DEB-SRC http://volatile.debian.org/debian-volatile lenny/volatile Main contrib Non-free
Then Apt-get update.
debian:~# Apt-get Install Sun-java6-jre
Very stupid to install it, here is omitted output.
Before experiencing 0.20, in the 0.18.3 installation, after all, is a stable version.
apt-get-y Install Hadoop-conf-pseudo
Reading Package Lists ... Done
Building Dependency Tree
Reading State information ... Done
the following extra packages'll be installed:
Hadoop hadoop-native liblzo2-2
The following NEW packages would be installed:
Hadoop hadoop-conf-pseudo hadoop-native liblzo2-2
0 upgraded, 4 newly installed, 0 to remove and 0 not upgraded.
Need to get 12.0MB/12.1MB of archives.
after this operation, 21.5MB of additional disk space would be used.
get:1 http://archive.cloudera.com lenny/contrib Hadoop 0.18.3-4cloudera0.3.0~lenny [11.9MB]
Get:2 http://archive.cloudera.com lenny/contrib Hadoop-conf-pseudo 0.18.3-4cloudera0.3.0~lenny [93.1kB]
get:3 http://archive.cloudera.com lenny/contrib hadoop-native 0.18.3-4cloudera0.3.0~lenny [92.7kB]
fetched 4336kB in 23s (184kb/s)
selecting previously deselected package liblzo2-2.
(Reading database ...) 103556 files and directories currently installed.)
unpacking liblzo2-2 (from .../lzo2/liblzo2-2_2.03-1_i386.deb) ...
Selecting previously deselected package Hadoop.
unpacking Hadoop (from .../hadoop_0.18.3-4cloudera0.3.0~lenny_all.deb) ...
selecting previously deselected package Hadoop-conf-pseudo.
unpacking Hadoop-conf-pseudo (from .../hadoop-conf-pseudo_0.18.3-4cloudera0.3.0~lenny_all.deb) ...
selecting previously deselected package hadoop-native.
unpacking hadoop-native (from .../hadoop-native_0.18.3-4cloudera0.3.0~lenny_i386.deb) ...
processing triggers for man-db ...
Setting up liblzo2-2 (2.03-1) ...
Setting up Hadoop (0.18.3-4cloudera0.3.0~lenny) ...
Setting up Hadoop-conf-pseudo (0.18.3-4cloudera0.3.0~lenny) ...
Setting up hadoop-native (0.18.3-4cloudera0.3.0~lenny) ...
See where the installation is.
debian:~# dpkg-l Hadoop-conf-pseudo
/.
/etc
/etc/hadoop
/etc/hadoop/conf.pseudo
/etc/hadoop/conf.pseudo/hadoop-default.xml
/etc/hadoop/conf.pseudo/configuration.xsl
/etc/hadoop/conf.pseudo/log4j.properties
/etc/hadoop/conf.pseudo/slaves
/etc/hadoop/conf.pseudo/sslinfo.xml.example
/etc/hadoop/conf.pseudo/hadoop-env.sh
/etc/hadoop/conf.pseudo/masters
/etc/hadoop/conf.pseudo/hadoop-metrics.properties
/etc/hadoop/conf.pseudo/commons-logging.properties
/etc/hadoop/conf.pseudo/hadoop-site.xml
/usr
/usr/share
/usr/share/doc
/usr/share/doc/hadoop-conf-pseudo
/usr/share/doc/hadoop-conf-pseudo/copyright
/usr/share/doc/hadoop-conf-pseudo/changelog. Debian.gz
/usr/share/doc/hadoop-conf-pseudo/changelog.gz
/usr/share/lintian
/usr/share/lintian/overrides
/usr/share/lintian/overrides/hadoop-conf-pseudo
debian:~# ls-l/var/lib/hadoop/cache/hadoop/dfs/name
Total 8
Drwxr-xr-x 2 Hadoop hadoop 4096 2009-06-24 02:58 current
Drwxr-xr-x 2 Hadoop hadoop 4096 2009-06-24 02:58 image
Start Hadoop Services:
debian:~#/etc/init.d/hadoop-namenode Start
Starting Hadoop Namenode daemon:starting Namenode, logging to/var/log/hadoop/hadoop-hadoop-namenode-debian.out
Hadoop-namenode.
/etc/init.d/hadoop-datanode start
Starting Hadoop Datanode daemon:starting Datanode, logging to/var/log/hadoop/hadoop-hadoop-datanode-debian.out
Hadoop-datanode.
debian:~#/etc/init.d/hadoop-jobtracker Start
Starting Hadoop Jobtracker daemon:starting Jobtracker, logging to/var/log/hadoop/hadoop-hadoop-jobtracker-debian.out
Hadoop-jobtracker.
See if the process is normal
Hadoop 7926 1 0 03:01? 00:00:12/usr/lib/jvm/java-6-sun//bin/java-xmx100m-dcom.sun.man
Hadoop 8007 1 1 03:02? 00:00:14/usr/lib/jvm/java-6-sun//bin/java-xmx100m-dcom.sun.man
Hadoop 8053 1 0 03:02? 00:00:13/usr/lib/jvm/java-6-sun//bin/java-xmx100m-dcom.sun.man
Hadoop 8108 1 0 03:02? 00:00:11/usr/lib/jvm/java-6-sun//bin/java-xmx100m-dhadoop.log
Hive and pig installation is also a command to handle, convenient and affordable.
Apt-get Install Hive
Apt-get Insall Pig
OK, we autoremove off 0.183 and experience the latest 0.20
debian:~# Apt-get Autoremove Hadoop-conf-pseudo
debian:~# wget http://archive.cloudera.com/hadoop-summit-09/hadoop-20-debs/deb_lenny_i386/hadoop-0.20_ 0.20.0-1cloudera0.5.0~lenny_all.deb
debian:~# dpkg-i Hadoop-0.20_0.20.0-1cloudera0.5.0~lenny_all.deb
Selecting previously deselected package hadoop-0.20.
(Reading database ...) 103589 files and directories currently installed.)
Unpacking hadoop-0.20 (from Hadoop-0.20_0.20.0-1cloudera0.5.0~lenny_all.deb) ...
Setting up hadoop-0.20 (0.20.0-1cloudera0.5.0~lenny) ...
Processing triggers for man-db ...
About the new progress of 0.20.