Pseudo-distributed
Three ways to install Hadoop:
- Local (Standalone) Mode
- Pseudo-distributed Mode
- Fully-distributed Mode
Required before installation
$ sudo apt-get install SSH
$ sudo apt-get install rsync
See: http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html
Pseudo-Distributed configuration configurations
Modify Bottom:
Etc/hadoop/core-site.xml:
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs:// Localhost:9000</value> </property></configuration>
Etc/hadoop/hdfs-site.xml:
<configuration> <property> <name>dfs.replication</name> <value>1< /value> </property></configuration>
Configuring SSH
$ ssh-keygen-t Dsa-p "-F ~/.SSH/ID_DSA $ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
If you want to run on yarn
You need to follow the steps below:
- Configure parameters as follows:
Etc/hadoop/mapred-site.xml:
<configuration> <property> <name>mapreduce.framework.name</name> <value >yarn</value> </property></configuration>
Etc/hadoop/yarn-site.xml:
<configuration> <property> <name>yarn.nodemanager.aux-services</name> < Value>mapreduce_shuffle</value> </property></configuration>
- Start ResourceManager daemon and NodeManager daemon:
$ sbin/start-yarn.sh
- Browse the Web interface for the ResourceManager; By default it's available at:
- ResourceManager- http://localhost:8088/
- Run a MapReduce job.
- When you ' re done, stop the daemons with:
$ sbin/stop-yarn.sh
Input:
http://localhost:8088/
Can see
After you start yarn
- Format the filesystem:
$ Bin/hdfs Namenode-format
- Start NameNode daemon and DataNode daemon:
$ sbin/start-dfs.sh
The Hadoop daemon log output is written to the $HADOOP _log_dir directory (defaults to $HADOOP _home/logs) .
- Browse the Web interface for the NameNode; By default it's available at:
- NameNode- http://localhost:50070/
The input is followed by:
Then perform the test
- Make the HDFS directories required to execute MapReduce jobs:
$ Bin/hdfs dfs-mkdir/user $ bin/hdfs dfs-mkdir/user/<username>
- Copy the input files into the distributed filesystem:
$ Bin/hdfs dfs-put Etc/hadoop input
- Run Some of the examples provided:
$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar grep input Output ' dfs[a-z. +
- Examine the output files:
Copy the output files from the distributed filesystem to the local filesystem and examine them:
$ bin/hdfs dfs-get output output $ cat output/*
Or
View the output files on the distributed filesystem:
$ Bin/hdfs Dfs-cat output/*
Look at the running situation:
View Results
The test execution succeeds and the local code can be written.
Eclipse hadoop2.6 Plug-in uses
Download Source:
git clone https://github.com/winghc/hadoop2x-eclipse-plugin.git
Download process:
To compile the plugin:
CD Src/contrib/eclipse-plugin
Ant jar-dversion=2.6.0-declipse.home=/usr/local/eclipse-dhadoop.home=/usr/local/hadoop-2.6.0//path according to its own configuration
- Copy the compiled jar into the Eclipse plugin directory and restart eclipse
- Configure the Hadoop installation directory
Windows->preference, Hadoop map/reduce, Hadoop installation directory
- Configuring the Map/reduce View
Window->open Perspective, other->map/reduce click "OK"
Windows→show view→other->map/reduce locations-> Click "OK"
- The console will have one more tab page for "Map/reduce Locations"
On the "Map/reduce Locations" tab, click the icon < elephant +> or right click on the blank, select "New Hadoop location ...", and the dialog box "new Hadoop locations ..." pops up, Configure the following: Change HA1 to your own Hadoop user
Note: The MR master and DFS master configurations must be consistent with the configuration files such as Mapred-site.xml and Core-site.xml.
Open Project Explorer to view the HDFs file system.
File->new->project->map/reduce Project->next
Write the WordCount class: Remember to get the service up first
/** * */ PackageCom.zongtui;/*** Classname:wordcount <br/> * Function:todo ADD Function. <br/> * Date:jun, 5:34:18 AM <br/ > * *@authorZhangfeng *@version * @sinceJDK 1.7*/Importjava.io.IOException;ImportJava.util.Iterator;ImportJava.util.StringTokenizer;ImportOrg.apache.hadoop.fs.Path;Importorg.apache.hadoop.io.IntWritable;Importorg.apache.hadoop.io.LongWritable;ImportOrg.apache.hadoop.io.Text;ImportOrg.apache.hadoop.mapred.FileInputFormat;ImportOrg.apache.hadoop.mapred.FileOutputFormat;Importorg.apache.hadoop.mapred.JobClient;Importorg.apache.hadoop.mapred.JobConf;Importorg.apache.hadoop.mapred.MapReduceBase;ImportOrg.apache.hadoop.mapred.Mapper;ImportOrg.apache.hadoop.mapred.OutputCollector;ImportOrg.apache.hadoop.mapred.Reducer;ImportOrg.apache.hadoop.mapred.Reporter;ImportOrg.apache.hadoop.mapred.TextInputFormat;ImportOrg.apache.hadoop.mapred.TextOutputFormat; Public classWordCount { Public Static classMapextendsMapreducebaseImplementsMapper<longwritable, text, text, intwritable> { Private Final StaticIntwritable one =NewIntwritable (1); PrivateText Word =NewText (); Public voidmap (longwritable key, Text value, Outputcollector<text, intwritable>output, Reporter Reporter)throwsIOException {String line=value.tostring (); StringTokenizer Tokenizer=NewStringTokenizer (line); while(Tokenizer.hasmoretokens ()) {Word.set (Tokenizer.nexttoken ()); Output.collect (Word, one); } } } Public Static classReduceextendsMapreducebaseImplementsReducer<text, Intwritable, Text, intwritable> { Public voidReduce (Text key, iterator<intwritable>values, Outputcollector<text, intwritable>output, Reporter Reporter)throwsIOException {intsum = 0; while(Values.hasnext ()) {sum+=Values.next (). get (); } output.collect (Key,Newintwritable (sum)); } } Public Static voidMain (string[] args)throwsException {jobconf conf=NewJobconf (WordCount.class); Conf.setjobname ("WordCount"); Conf.setoutputkeyclass (Text.class); Conf.setoutputvalueclass (intwritable.class); Conf.setmapperclass (Map.class); Conf.setreducerclass (Reduce.class); Conf.setinputformat (Textinputformat.class); Conf.setoutputformat (Textoutputformat.class); Fileinputformat.setinputpaths (conf,NewPath (args[0])); Fileoutputformat.setoutputpath (conf,NewPath (args[1])); Jobclient.runjob (conf); }}
User/admin123/input/hadoop is the file you upload in the HDFs folder (which you create yourself) and put in the files you want to process. OUPUT1 Output Results
Run the program on a Hadoop cluster: Right--->runas-->run on Hadoop, and the final output will be displayed in the appropriate folder in HDFs. At this point, Ubuntu hadoop-2.6.0 Eclipse plugin configuration is complete.
Encountered an exception
Exception in thread "main" Org.apache.hadoop.mapred.FileAlreadyExistsException:Output directory HDFs://Localhost:9000/output already existsAt Org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs (fileoutputformat.java:132) at Org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs (Jobsubmitter.java:564) at Org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal (Jobsubmitter.java:432) at org.apache.hadoop.mapreduce.job$10.run (job.java:1296) at org.apache.hadoop.mapreduce.job$10.run (job.java:1293) at java.security.AccessController.doPrivileged (Native Method) at Javax.security.auth.Subject.doAs (Subject.jav A:415) at Org.apache.hadoop.security.UserGroupInformation.doAs (Usergroupinformation.java:1628) at Org.apache.hadoop.mapreduce.Job.submit (Job.java:1293) at org.apache.hadoop.mapred.jobclient$1.run (jobclient.java:562) at org.apache.hadoop.mapred.jobclient$1.run (jobclient.java:557) at java.security.AccessController.doPrivileged (Native Method) at Javax.security.auth.Subject.doAs (Subject.jav A:415) at Org.apache.hadoop.security.UserGroupInformation.doAs (Usergroupinformation.java:1628) at Org.apache.hadoop.mapred.JobClient.submitJobInternal (Jobclient.java:557) at Org.apache.hadoop.mapred.JobClient.submitJob (Jobclient.java:548) at Org.apache.hadoop.mapred.JobClient.runJob (Jobclient.java:833) at Com.zongtui.WordCount.main (Wordcount.java:83)
1, change the output path.
2. Remove the re-build.
After the run is finished, look at the results:
Come with me. Hadoop (1)-hadoop2.6 installation and use