Tags: hadoop
Summary: This article describes how to install three Ubuntu virtual machines in virtualbox, build a hadoop environment, and finally run the wordcount routine in hadoop's built-in example.
1. Lab Environment
Virtualbox version: 4.3.2 r90405
Ubuntu virtual machine version: ubuntu11.04
Ubuntu Virtual Machine JDK version: jdk-1.6.0_45
Ubuntu
Hadoop cannot be started properly (1)
Failed to start after executing $ bin/hadoop start-all.sh.
Exception 1
Exception in thread "Main" Java. Lang. illegalargumentexception: Invalid URI for namenode address (check fs. defaultfs): file: // has no authority.
Localhost: At org. Apache. hadoop. HDFS. server. namenode. namenode. getaddress (namenode. Java: 214)
Localh
Hadoop is getting increasingly popular, and hadoop has a core thing, that is, mapreduce. It plays an important role in hadoop parallel computing and is also used for program development under hadoop, to learn more, let's take a look at wordcount, a simple example of maprecud
Chapter 1 Meet HadoopData is large, the transfer speed is not improved much. it's a long time to read all data from one single disk-writing is even more slow. the obvious way to reduce the time is read from multiple disk once.The first problem to solve is hardware failure. The second problem is that most analysis task need to be able to combine the data in different hardware.
Chapter 3 The Hadoop Distributed FilesystemFilesystem that manage storage h
have installed the hadoop plug-in successfully, configure the hadoop plug-in to connect it to your hadoop platform.
Create a new Hadoop Location in Map/Reduce Locations. In this View, right-click New Hadoop Location. In the pop-up dialog box, you need to configure the Locat
needs to be changed to 1.set up HDFs backup mode -Configuration> Property> name>Dfs.replicationname> value>1value> Property>Configuration>
Modify Mapred-site.xml
The file is a MapReduce configuration file that configures the Jobtracker address and port.Configuration> Property> name>Mapred.job.trackername> value>localhost:9001value> Property>Configuration>4. Before starting Hadoop, file system
to the Environment/etc/profile:
Export hadoop_home =/ home/hexianghui/hadoop-0.20.2
Export Path = $ hadoop_home/bin: $ path
7. Configure hadoop
The main configuration of hadoop is under the hadoop-0.20.2/CONF.
(1) configure the Java environment in CONF/hadoop-env.sh (nameno
of RPC
Jar
Run a jar file. You can package their map reduce files and run this command.
Usage: hadoop jar
This command is required for stream operations. Examples can be found in streaming examples.
You can also run the word statistics example using the jar command. For this example, you can also view it in wordcount example.Job
Interaction with map reduce job naming.
Usage:
First explain the configured environmentSystem: Ubuntu14.0.4Ide:eclipse 4.4.1Hadoop:hadoop 2.2.0For older versions of Hadoop, you can directly replicate the Hadoop installation directory/contrib/eclipse-plugin/hadoop-0.20.203.0-eclipse-plugin.jar to the Eclipse installation directory/plugins/ (and not personally verified). For HADOOP2, you need to build the jar f
as hello,good and so on
2, the preparation of the file uploaded to the Hadoop space
3, into the share folder under the MapReduce
4. Execute the WORDCOUNT program in jar package Hadoop-mapreduce-example
Hadoop jar Hadoop-mapreduce-examples-2.4.1.jar
, browse the hadoop file, there are some things ,????1. Open conf/hadoop-env.sh,?????? Configure conf/hadoop-env.sh(Find #export java_home= ..., remove #, and then add the path to the native jdk ), the------------------------------------------------------------------------------------------------------------------------------------------------------------------
to enter the password to log on to the local machine using SSH.
4. Run Hadoop:
Format:
bin/hadoop namenode -format
Start all processes:
bin/start-all.sh
If an error is reported:
Unable to load realm info from SCDynamicStore
Add at the end of the hadoop-env.sh file:
export HADOOP_OPTS="-Djava.security.krb5.realm=OX.AC.UK -Djava.security.krb5.kdc=kdc0.ox.ac.uk:kd
/jimmy/Downloads/hadoop-1.2.1.tar.gz hadoop3. DecompressCd hadoopSudo tar-zxf hadoop-1.2.1.tar.gzBy default, it is extracted to the hadoop \ hadoop-1.2.1 directory4. Modify hadoop Environment VariablesSudo chown-R jimmy hadoop-1.2
Objective:Some time ago to learn how to deploy a pseudo-distributed model of the Hadoop environment, because the work is busy, learning progress stalled for some time, so today to take the time to learn the results of the recent and share with you.This article is about how to use VMware to build your own Hadoop cluster. If you want to know about pseudo-distributed people and
configuration to be consistent with Hadoop, such as the Hadoop pseudo-distributed configuration I used, set Fs.defaultfs to hdfs://localhost:9000, then DFS maste The Post for R should also be changed to 9000.
Location Name is free to fill in, Map/reduce Master Host will fill in your native IP (localhost also line), Port default is 50020. The final settings are as follows:
Settings for
data processing, the key value pair is flexible.
How to understand the MapReduce of Hadoop:
Here's an article I think is interesting: here's a link for everyone to learn how I explained MapReduce to my wife.
The conceptual stuff sounds a little tedious: let's move on to our own MapReduce program:
We all know that there is an example of wordcount in the Hadoop ex
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.