HADOOP2 Pseudo-Distributed deployment

Source: Internet
Author: User
Tags hdfs dfs hadoop fs

I. INTRODUCTION

Two. Install the deployment

Three. Run the Hadoop example and test the deployment environment

Four. Place of attention


A Brief introduction

Hadoop is a distributed system infrastructure developed by the Apache Foundation, and the core design of the Hadoop framework is: HDFs and MapReduce. HDFS provides storage for massive amounts of data, HDFs has a high level of fault tolerance, and is designed to be deployed on inexpensive (low-cost) hardware, and it provides high throughput (hi throughput) to access application data for those with very large datasets (large Data set) of the application. HDFs relaxes the requirements of (relax) POSIX to access data in the streaming Access file system as a stream, and MapReduce provides calculations for massive amounts of data. Hadoop makes it easy to develop and run platforms that handle large-scale data.

HDFs is a Java language development, so any Java-enabled machine can deploy Namenode or Datanode, HDFs uses the Master/slave architecture, where an HDFS cluster consists of a namenode and a certain number of datanodes. Namenode is a central server that manages the file system's namespace (namespace) and client access to files. The Datanode in a cluster is typically a node that is responsible for managing the storage on the node it resides on. HDFs exposes the namespace of the file system, allowing users to store data in the form of files. Internally, a file is actually partitioned into one or more blocks of data that are stored on a set of Datanode. Namenode performs namespace operations on the file system, such as opening, closing, renaming files or directories. It is also responsible for determining the mapping of data blocks to specific datanode nodes. The Datanode is responsible for handling read and write requests from the file system client. The creation, deletion and replication of data blocks under the unified dispatch of Namenode. As shown:

Hadoop Map/reduce is an easy-to-use software framework based on the applications it writes out to run on a large cluster of thousands of commercial machines, and in parallel to the T-level datasets in a reliable, fault-tolerant way. A map/reduce job typically divides the input dataset into separate pieces of data that are processed in a completely parallel manner by the Map Task (Task). The framework sorts the output of the map first and then inputs the results to the reduce task. Usually the inputs and outputs of the job are stored in the file system. The entire framework is responsible for scheduling and monitoring tasks, as well as re-executing tasks that have failed.

Hadoop is a distributed computing platform that makes it easy for users to architect and use. Users can easily develop and run applications that process massive amounts of data on Hadoop. It mainly has the following advantages:

1. High reliability. Hadoop's ability to store and process data in bits is worth trusting.

2. High scalability. Hadoop distributes data between available computer sets and completes computational tasks, and these clusters can be easily extended to thousands of nodes.

3. High efficiency. Hadoop can dynamically move data between nodes and ensure the dynamic balance of individual nodes, so processing is very fast.

4. High fault tolerance. Hadoop has the ability to automatically save multiple copies of data and automatically reassign failed tasks.

5. Low cost. Hadoop is open source compared to data marts such as all-in-one, commercial data warehouses, and Qlikview, Yonghong z-suite, and thus greatly reduces the cost of software for the project.

Hadoop Pseudo-distributed mode is a single-machine simulation of Hadoop distributed, the condition is limited, so the Linux virtual machine deployed Hadoop pseudo-distributed mode to simulate Hadoop distributed.

Two Installation deployment

The JDK installation configuration on the Linux virtual machine is not described here, as I have described earlier in this article.

First step: SSH No password authentication configuration

1. Check if SSH is installed


2. Create a new SSH key based on the empty password, enable password-free login
#ssh-keygen-t rsa-p '-f~/.ssh/id_rsa
#cat ~/.ssh/id_rsa.pub >>~/.ssh/authorized_keys

3. Test if you need a password to log in

#ssh localhost:

This means that the installation is successful and the first time you log in will ask if you want to continue the link and enter Yes.

Description:

Because during the installation of Hadoop, if you do not configure no password login, each time you start Hadoop, you need to enter a password to login to Datanode, we usually do the cluster, so it is not convenient for us to operate.

Step Two: Deploy Hadoop

1. Download http://hadoop.apache.org/

We're downloading hadoop-2.6.0.tar.gz here.

2.mkdir/usr/local/hadoop

3. #tar-ZXVF hadoop-2.6.0.tar.gz//Decompression

4. #vi/etc/profile//Configure Hadoop

Export hadoop_home=/usr/local/hadoop/hadoop-2.6.0

Export path= $HADOOP _home/bin: $PATH

Export Hadoop_log_dir=${hadoop_home}/logs

#source/etc/profile//Make configuration effective

5. Configuring Core-site.xml,hdfs-site.xml and Mapred-site.xml

#cd hadoop-2.6.0//into the directory extracted by Hadoop

Etc/hadoop/core-site.xml<configuration> <property> <name>fs.default.name</name> & Lt;value>hdfs://192.168.74.129:9000</value></property> <property> <name>hadoop.tmp.di r</name> <value>/usr/local/hadoop/hadoop-2.6.0/tmp</value> </property> </confi Guration>etc/hadoop/hdfs-site.xml <configuration> <property> <name>dfs.name.dir</na me> <value>/usr/local/hadoop/hadoop-2.6.0/hdfs/name</value> <description> namenode Storage Road           Paths </description> </property> <property> <name>dfs.data.dir</name> <value>/usr/local/hadoop/hadoop-2.6.0/hdfs/data</value> <description> on Datanode storage path </de scription> </property> <property> <name>dfs.replication</name> <valu    E>1</value></property></configuration>etc/hadoop/mapred-site.xml<configuration> <property> <name >mapred.job.tracker</name> <value>hdfs://192.168.74.129:9001</value></property> < Property> <name>mapred.local.dir</name> <value>/usr/local/hadoop/hadoop-2.6.0/map red/local</value> <description> Storage mapred own path </description> </property> &L T;property> <name>mapred.system.dir</name> <value>/usr/local/hadoop/hadoop-2.6.0/   Mapred/system</value> <description> Storage mapred System-level paths that can be shared </description> </property> </configuration>
6. Modificationsjdk path for hadoop-env.sh

#vi etc/hadoop/hadoop-env.sh

Export java_home=/usr/java/jdk1.7.0_67

7. #hadoop Namenode-format//Format the HDFs file system to create an empty large file system.

Successful execution:

Generate the corresponding directories for Core-site.xml,hdfs-site.xml and Mapred-site.xml:


Step three: Start the Hadoop service

1. #sbin/start-all.sh//Start sbin/stop-all.sh//Shut down


2. #jps//Verify

3. View Hadoop information in the browser

http://192.168.74.129:50070

http://192.168.74.129:8088 Hadoop Administration page


4. You can view the log in Cd/usr/local/hadoop/hadoop-2.6.0/logs


Three Run the Hadoop example and test the deployment environment


1. Let's run it from the official web example to see if the Hadoop environment we deployed is correct and the statistics code is as follows:

Import Java.io.ioexception;import Java.util.stringtokenizer;import Org.apache.hadoop.conf.configuration;import Org.apache.hadoop.fs.path;import Org.apache.hadoop.io.intwritable;import Org.apache.hadoop.io.text;import Org.apache.hadoop.mapreduce.job;import Org.apache.hadoop.mapreduce.mapper;import Org.apache.hadoop.mapreduce.reducer;import Org.apache.hadoop.mapreduce.lib.input.fileinputformat;import       Org.apache.hadoop.mapreduce.lib.output.fileoutputformat;public class WordCount {public static class Tokenizermapper    Extends Mapper<object, text, text, intwritable>{private final static intwritable one = new intwritable (1);    Private text Word = new text ();      public void Map (Object key, Text value, Context context) throws IOException, Interruptedexception {      StringTokenizer ITR = new StringTokenizer (value.tostring ());        while (Itr.hasmoretokens ()) {Word.set (Itr.nexttoken ());      Context.write (Word, one); }}} public static CLass Intsumreducer extends reducer<text,intwritable,text,intwritable> {private intwritable result = new in    Twritable ();                       public void reduce (Text key, iterable<intwritable> values, context context      ) throws IOException, interruptedexception {int sum = 0;      for (intwritable val:values) {sum + = Val.get ();      } result.set (sum);    Context.write (key, result);    }} public static void Main (string[] args) throws Exception {Configuration conf = new configuration ();    Job Job = job.getinstance (conf, "word count");    Job.setjarbyclass (Wordcount.class);    Job.setmapperclass (Tokenizermapper.class);    Job.setcombinerclass (Intsumreducer.class);    Job.setreducerclass (Intsumreducer.class);    Job.setoutputkeyclass (Text.class);    Job.setoutputvalueclass (Intwritable.class);    Fileinputformat.addinputpath (Job, New Path (Args[0]));    Fileoutputformat.setoutputpath (Job, New Path (Args[1])); System.Exit (Job.waitforcompletion (TRUE)? 0:1); }}

Upload to the directory extracted by Hadoop:


2. See if there is a directory

#hadoop Fs–ls//Just after deployment, we have not created the directory, so it will show no directory:

3. We create a new file0.txt text, enter the content, that is, we want to count the words,:


4. Create the input and output directories

Create a folder on HDFs first

#bin/hdfs Dfs-mkdir–p/user/root/input

#bin/hdfs dfs-mkdir-p/user/root/output

5. Upload the text you want to count to the input directory of HDFs

# Bin/hdfs dfs-put/usr/local/hadoop/hadoop-2.6.0/test/*/user/root/input//upload Tes/file0 file to HDFs/user/root/input

6. View

#bin/hdfs Dfs-cat/user/root/input/file0

7. Compile the Statistics Java class Wordcount.java we just wrote

#bin/hadoop Com.sun.tools.javac.Main Wordcount.java


8. The compiled Wordcount.class creates a jar

Jar CF Wc.jar Wordcount*.class

9. Executive Statistics

#bin/hadoop jar Wc.jar Wordcount/user/root/input/user/root/output/count


10. View output outputs

#bin/hdfs dfs-cat/user/root/output/count/part-r-00000



Four. Place of attention

1. When the Namenode-format is formatted, the error is reported when the hostname configuration is incorrect:

Analysis:

#hsotname

the corresponding hostname is not found in the/etc/hosts file.

Workaround:

1) vi/etc/hosts//Modify IP Settings

2) #vi/etc/sysconfig/network//Modify hostname

3)/etc/rc.d/init.d/network Restart//reboot

Reboot the Linux virtual machine if you haven't changed it.


2. #hadoop Fs–ls will appear

Java HotSpot (TM) Server VM Warning:youhave loaded library/usr/local/hadoop/hadoop-2.6.0/lib/native/ libhadoop.so.1.0.0 which might havedisabled stack guard. The VM would try to fix the stack guard now.

It's highly recommended that's fix thelibrary with ' execstack-c <libfile> ', or link it with '-Z noexecstack '.


Solution:

#vi/etc/profile

Export hadoop_common_lib_native_dir= $HADOOP _home/lib/native
Export hadoop_opts= "-djava.library.path= $HADOOP _home/lib"

#source/etc/profile

There is no such problem in executing Hadoop fs–ls:







HADOOP2 Pseudo-Distributed deployment

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.