HDFs design ideas, HDFs use, view cluster status, Hdfs,hdfs upload files, HDFS download files, yarn Web management Interface Information view, run a mapreduce program, MapReduce Demo

Last Update:2017-12-20 Source: Internet

Author: User

Tags hadoop fs

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

26 Preliminary use of cluster

Design ideas of HDFs

L Design Ideas

Divide and Conquer: Large files, large batches of files, distributed on a large number of servers, so as to facilitate the use of divide-and-conquer method of massive data analysis;

L role in Big Data systems:

For a variety of distributed computing framework (such as: Mapreduce,spark,tez, ... ) Provides data storage services

L Key Concepts: File Cut, copy storage, meta data

26.1 HDFs Use

1. View cluster status

Command: HDFs dfsadmin–report

As you can see, there are 3 datanode available in a cluster

You can also open the Web console to view the HDFs cluster information in the browser open http://hadoop:50070/

2. Uploading files to HDFs

To view directory information in HDFs

Command: Hadoop fs–ls/

Uploading files

Command: Hadoop fs-put./findbugs-1.3.9/

[Email protected] software]$ Hadoop fs-put./findbugs-1.3.9/

Put: '/findbugs-1.3.9/license-asm.txt ': File exists

Put: '/findbugs-1.3.9/license-applejavaextensions.txt ': File exists

Put: '/findbugs-1.3.9/license-bcel.txt ': File exists

Put: '/findbugs-1.3.9/license-commons-lang.txt ': File exists

Put: '/findbugs-1.3.9/license-docbook.txt ': File exists

Put: '/findbugs-1.3.9/license-dom4j.txt ': File exists

Put: '/findbugs-1.3.9/license-jformatstring.txt ': File exists

View a list of uploaded messages (Hadoop fs–ls/or Hadoop fs-ls/findbugs-1.3.9)

Download files from HDFs

Command: Hadoop fs-get/findbugs-1.3.9/license-asm.txt

[Email protected] learn]$ Cd/home/toto/learn

/home/toto/learn

[Email protected] learn]$ pwd

/home/toto/learn

[Email protected] learn]$ Hadoop fs-get/findbugs-1.3.9/license-asm.txt

[[email protected] learn]$ ls

License-asm.txt

Yarn's management interface is: Http://hadoop:8088/cluster

26.2 simulation runs a mapreduce program

When running a mapreduce program, you need to start HDFs and the Start command is:

[[email protected] hadoop-2.8.0] $CD/home/toto/software/hadoop-2.8.0

[[email protected] hadoop-2.8.0] $sbin/start-dfs.sh

There is a working example of MapReduce under/home/toto/software/hadoop-2.8.0/share/hadoop/mapreduce:

[Email protected] mapreduce]$ Cd/home/toto/software/hadoop-2.8.0/share/hadoop/mapreduce

[Email protected] mapreduce]$ pwd

/home/toto/software/hadoop-2.8.0/share/hadoop/mapreduce

[email protected] mapreduce]$ LL

Total dosage 5088

-rw-r--r--. 1 toto Hadoop 562900 March 13:31 Hadoop-mapreduce-client-app-2.8.0.jar

-rw-r--r--. 1 toto Hadoop 782739 March 13:31 Hadoop-mapreduce-client-common-2.8.0.jar

-rw-r--r--. 1 toto Hadoop 1571179 March 13:31 Hadoop-mapreduce-client-core-2.8.0.jar

-rw-r--r--. 1 toto Hadoop 195000 March 13:31 Hadoop-mapreduce-client-hs-2.8.0.jar

-rw-r--r--. 1 toto Hadoop 31533 March 13:31 Hadoop-mapreduce-client-hs-plugins-2.8.0.jar

-rw-r--r--. 1 toto Hadoop 66999 March 13:31 Hadoop-mapreduce-client-jobclient-2.8.0.jar

-rw-r--r--. 1 toto Hadoop 1587158 March 13:31 Hadoop-mapreduce-client-jobclient-2.8.0-tests.jar

-rw-r--r--. 1 toto Hadoop 75495 March 13:31 Hadoop-mapreduce-client-shuffle-2.8.0.jar

-rw-r--r--. 1 toto Hadoop 301934 March 13:31 Hadoop-mapreduce-examples-2.8.0.jar

Drwxr-xr-x. 2 toto Hadoop 4096 March 13:31 Jdiff

Drwxr-xr-x. 2 toto Hadoop 4096 March 13:31 Lib

Drwxr-xr-x. 2 toto Hadoop 4096 March 13:31 lib-examples

Drwxr-xr-x. 2 toto Hadoop 4096 March 13:31 sources

[Email protected] mapreduce]$

To run the MapReduce command with a command:

[[email protected] mapreduce]$ Hadoop jar Hadoop-mapreduce-examples-2.8.0.jar PI 5 5

Number of Maps = 5

Samples per Map = 5

Wrote input for Map #0

Wrote input for Map #1

Wrote input for Map #2

Wrote input for Map #3

Wrote input for Map #4

Starting Job

17/05/29 14:47:36 INFO Client. Rmproxy:connecting to ResourceManager at hadoop/192.168.106.80:8032

17/05/29 14:47:37 INFO input. Fileinputformat:total input files to Process:5

17/05/29 14:47:37 INFO MapReduce. Jobsubmitter:number of Splits:5

17/05/29 14:47:38 INFO MapReduce. Jobsubmitter:submitting Tokens for job:job_1495998405307_0001

17/05/29 14:47:39 INFO Impl. yarnclientimpl:submitted Application application_1495998405307_0001

17/05/29 14:47:39 INFO MapReduce. Job:the URL to track the job:http://hadoop:8088/proxy/application_1495998405307_0001/

17/05/29 14:47:39 INFO MapReduce. Job:running job:job_1495998405307_0001

17/05/29 14:48:00 INFO MapReduce. Job:job job_1495998405307_0001 running in Uber Mode:false

17/05/29 14:48:00 INFO MapReduce. Job:map 0% Reduce 0%

Enter the management interface (HTTP://HADOOP:8088/CLUSTER/APPS) of HDFs to see how the program works:

26.2 MapReduce Use

MapReduce is a distributed computing programming framework in Hadoop that, as long as it is programmed, only needs to write a small amount of business logic code to implement a powerful mass data concurrency handler.

26.2.1 Demo Development--wordcount

1. Demand

Count the total number of occurrences of each word from a large number of text files (such as T-level)

2, the realization of the idea of MapReduce

Map phase:

A) Read data row by line from the source data file in HDFs

b) Cut each line of data into words

c) construct a key-value pair for each word (word, 1)

d) Send the key-value pair to the reduce

Reduce phase:

A) receive a Word key value pair from the map stage output

b) Aggregation of key-value pairs of the same word into a group

c) For each group, iterate through all the "values" in the group, summing up the total number of occurrences of each word

d) Output (words, total number of times) to the file in HDFs

1, the specific code implementation

(1) Define a mapper class

The first thing to define is the four generic type

Keyin:longwritable Valuein:text

Keyout:text valueout:intwritable

public class Wordcountmapper extends mapper<longwritable, text, text, intwritable>{

The life cycle of the map method: The frame is called once per row of data

Key: The offset of the starting point of this line in the file

Value: The contents of this line

@Override

protected void Map (longwritable key, Text value, Context context) throws IOException, Interruptedexception {

Get a row of data converted to string

String line = value.tostring ();

Cut this line out of each word

string[] Words = Line.split ("");

Traversal array, output < word,1>

for (String word:words) {

Context.write (New Text (word), new intwritable (1));

}

(2) define a reducer class

Life cycle: The reduce method is called once per KV group passed in the framework

@Override

protected void reduce (Text key, iterable<intwritable> values, context context) throws IOException, interruptedexception {

Define a counter

int count = 0;

Traverse all V of this set of kv and accumulate to count

for (intwritable value:values) {

Count + = Value.get ();

}

Context.write (Key, New Intwritable (count));

}

(3) Define a main class to describe the job and submit the job

public class Wordcountrunner {

The information about the business logic (which is the Mapper, which is the reducer, where the data is to be processed, where the results of the output are placed ...). ) described as a Job object

Submit the described job to the cluster to run

public static void Main (string[] args) throws Exception {

Configuration conf = new configuration ();

Job wcjob = job.getinstance (conf);

Specify the jar package where I am the job

Wcjob.setjar ("/home/hadoop/wordcount.jar");

Wcjob.setjarbyclass (Wordcountrunner.class);

Wcjob.setmapperclass (Wordcountmapper.class);

Wcjob.setreducerclass (Wordcountreducer.class);

Set the data type of the output key and value of our business logic Mapper class

Wcjob.setmapoutputkeyclass (Text.class);

Wcjob.setmapoutputvalueclass (Intwritable.class);

Set the data type of the output key and value of our business logic reducer class

Wcjob.setoutputkeyclass (Text.class);

Wcjob.setoutputvalueclass (Intwritable.class);

Specify where the data you want to work is located

Fileinputformat.setinputpaths (Wcjob, "hdfs://hdp-server01:9000/wordcount/data/big.txt");

Specify where to save the results after processing is complete

Fileoutputformat.setoutputpath (Wcjob, New Path ("hdfs://hdp-server01:9000/wordcount/output/"));

Submit this job to the yarn cluster

Boolean res = Wcjob.waitforcompletion (true);

System.exit (res?0:1);

}

26.2.2 Program Packaging Run

1. Package The program

2. Prepare input data

Vi/home/hadoop/test.txt

Hello, Tom.

Hello Jim

Hello Ketty

Hello World

Ketty Tom

Create an input Data folder on HDFs:

Hadoop FS Mkdir-p/wordcount/input

Uploading Words.txt to HDFs

Hadoop fs–put/home/hadoop/words.txt/wordcount/input

3. Upload the program jar package to any server on the cluster

4. Use the command to start the Execute WordCount program jar package

$ Hadoop jar Wordcount.jar Cn.toto.bigdata.mrsimple.wordcountdriver/wordcount/input/wordcount/out

5. View execution Results

$ Hadoop fs–cat/wordcount/out/part-r-00000

Original link http://blog.csdn.net/tototuzuoquan/article/details/72802439

HDFs design ideas, HDFs use, view cluster status, Hdfs,hdfs upload files, HDFS download files, yarn Web management Interface Information view, run a mapreduce program, MapReduce Demo

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More