Mac OS hadoop mahout Installation
1. Download hadoop, mahout:
You can download it directly from labs.renren.com/apache-#/hadoopand labs.renren.com/apache-#/mahout.
2. Configure the hadoop configuration file:
(1) core-site.xml:
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000/</value> </property> </configuration>
(2) mapred-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>mapred.job.tracker</name> <value>localhost:9001</value> </property> </configuration>
(3) hdfs-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration>
(4) Add the following configuration information at the end of the hadoop-env.sh file:
export JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Versions/1.6.0/Home export HADOOP_INSTALL=/Users/alex/Documents/DevRes/hadoop-0.21.0 export PATH=$PATH:$HADOOP_INSTALL/bin
3. Configure SSH
Choose System Preference Settings> share> remote logon.
Configure keyless login:
(1) generate a key (Public Key ):
ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
Ssh-keygen indicates the generated key;-t indicates the type of the generated key; dsa indicates the dsa key authentication, that is, the key type;-P is used to provide the secret language; -f indicates the generated key file
(2) Add the public key to the authentication file:
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
After the settings are complete, you do not need to enter the password to log on to the local machine using SSH.
4. Run Hadoop:
Format:
bin/hadoop namenode -format
Start all processes:
bin/start-all.sh
If an error is reported:
Unable to load realm info from SCDynamicStore
Add at the end of the hadoop-env.sh file:
export HADOOP_OPTS="-Djava.security.krb5.realm=OX.AC.UK -Djava.security.krb5.kdc=kdc0.ox.ac.uk:kdc1.ox.ac.uk"
5. Test WordCount
First generate the input file input.txt:
1 hello world2 hello hadoop
Create a directory input in HDFS:
bin/hadoop fs -mkdir input
Put the input file in this directory:
bin/hadoop fs -put input.txt input
Run WordCount in example. jar of Hadoop:
bin/hadoop jar hadoop-version-example.jar wordcount input output
The running result will be placed in the output Folder, showing the information in this folder:
bin/hadoop fs -ls output
Three files are displayed: _ success, _ logs, part-r-00000, and real results are stored in the part-r-00000:
bin/hadoop fs -cat output/part-r-00000
The final result is:
hadoop 1hello 2world 1
6. Configure Mahout:
Add the following configuration information to the end of the/etc/profile file:
export=JAVA_HOME/System/Library/Frameworks/JavaVM.framework/Versions/1.6.0/Home export MAHOUT_HOME=/path/to/mahout
Run bin/mahout -- help. If any help information is displayed, the installation is successful.
7. Configure Hadoop and Mahout in Eclipse
(1) configuring hadoop is the same as configuring hadoop in other operating systems. hadoop's eclipse-plugin jar package is imported into the eclipse plugin folder, configure the hadoop installation path under preference> hadoop MAP/reduce in eclipse.
(2) After hadoop is configured, create a map/reduce project, and set core, core-job, and math in the mahout directory, add four util jar packages to the build path of the project.
References:
[1] installing pseudo-distributed hadoop in Mac OS
[2] installing pseudo-distributed hadoop and Eclipse plug-ins in Mac OS
[3] configuring SSH key-free Login
[4] unable to load realm info from scdynamicstore error Solution