26 Preliminary use of cluster
Design ideas of HDFs
L Design Ideas
Divide and Conquer: Large files, large batches of files, distributed on a large number of servers, so as to facilitate the use of divide-and-conquer method of massive data analysis;
L role in Big Data systems:
For a variety of distributed computing framework (such as: Mapreduce,spark,tez, ... ) Provides data storage services
L Key Concepts: File Cut, copy storage, meta data
26.1 HDFs Use
1. View cluster status
Command: HDFs dfsadmin–report
As you can see, there are 3 datanode available in a cluster
You can also open the Web console to view the HDFs cluster information in the browser open http://hadoop:50070/
2. Uploading files to HDFs
To view directory information in HDFs
Command: Hadoop fs–ls/
Uploading files
Command: Hadoop fs-put./findbugs-1.3.9/
[Email protected] software]$ Hadoop fs-put./findbugs-1.3.9/ Put: '/findbugs-1.3.9/license-asm.txt ': File exists Put: '/findbugs-1.3.9/license-applejavaextensions.txt ': File exists Put: '/findbugs-1.3.9/license-bcel.txt ': File exists Put: '/findbugs-1.3.9/license-commons-lang.txt ': File exists Put: '/findbugs-1.3.9/license-docbook.txt ': File exists Put: '/findbugs-1.3.9/license-dom4j.txt ': File exists Put: '/findbugs-1.3.9/license-jformatstring.txt ': File exists |
View a list of uploaded messages (Hadoop fs–ls/or Hadoop fs-ls/findbugs-1.3.9)
Download files from HDFs
Command: Hadoop fs-get/findbugs-1.3.9/license-asm.txt
[Email protected] learn]$ Cd/home/toto/learn /home/toto/learn [Email protected] learn]$ pwd /home/toto/learn [Email protected] learn]$ Hadoop fs-get/findbugs-1.3.9/license-asm.txt [[email protected] learn]$ ls License-asm.txt |
Yarn's management interface is: Http://hadoop:8088/cluster
26.2 simulation runs a mapreduce program
When running a mapreduce program, you need to start HDFs and the Start command is:
[[email protected] hadoop-2.8.0] $CD/home/toto/software/hadoop-2.8.0 [[email protected] hadoop-2.8.0] $sbin/start-dfs.sh |
There is a working example of MapReduce under/home/toto/software/hadoop-2.8.0/share/hadoop/mapreduce:
[Email protected] mapreduce]$ Cd/home/toto/software/hadoop-2.8.0/share/hadoop/mapreduce [Email protected] mapreduce]$ pwd /home/toto/software/hadoop-2.8.0/share/hadoop/mapreduce [email protected] mapreduce]$ LL Total dosage 5088 -rw-r--r--. 1 toto Hadoop 562900 March 13:31 Hadoop-mapreduce-client-app-2.8.0.jar -rw-r--r--. 1 toto Hadoop 782739 March 13:31 Hadoop-mapreduce-client-common-2.8.0.jar -rw-r--r--. 1 toto Hadoop 1571179 March 13:31 Hadoop-mapreduce-client-core-2.8.0.jar -rw-r--r--. 1 toto Hadoop 195000 March 13:31 Hadoop-mapreduce-client-hs-2.8.0.jar -rw-r--r--. 1 toto Hadoop 31533 March 13:31 Hadoop-mapreduce-client-hs-plugins-2.8.0.jar -rw-r--r--. 1 toto Hadoop 66999 March 13:31 Hadoop-mapreduce-client-jobclient-2.8.0.jar -rw-r--r--. 1 toto Hadoop 1587158 March 13:31 Hadoop-mapreduce-client-jobclient-2.8.0-tests.jar -rw-r--r--. 1 toto Hadoop 75495 March 13:31 Hadoop-mapreduce-client-shuffle-2.8.0.jar -rw-r--r--. 1 toto Hadoop 301934 March 13:31 Hadoop-mapreduce-examples-2.8.0.jar Drwxr-xr-x. 2 toto Hadoop 4096 March 13:31 Jdiff Drwxr-xr-x. 2 toto Hadoop 4096 March 13:31 Lib Drwxr-xr-x. 2 toto Hadoop 4096 March 13:31 lib-examples Drwxr-xr-x. 2 toto Hadoop 4096 March 13:31 sources [Email protected] mapreduce]$ To run the MapReduce command with a command: [[email protected] mapreduce]$ Hadoop jar Hadoop-mapreduce-examples-2.8.0.jar PI 5 5 Number of Maps = 5 Samples per Map = 5 Wrote input for Map #0 Wrote input for Map #1 Wrote input for Map #2 Wrote input for Map #3 Wrote input for Map #4 Starting Job 17/05/29 14:47:36 INFO Client. Rmproxy:connecting to ResourceManager at hadoop/192.168.106.80:8032 17/05/29 14:47:37 INFO input. Fileinputformat:total input files to Process:5 17/05/29 14:47:37 INFO MapReduce. Jobsubmitter:number of Splits:5 17/05/29 14:47:38 INFO MapReduce. Jobsubmitter:submitting Tokens for job:job_1495998405307_0001 17/05/29 14:47:39 INFO Impl. yarnclientimpl:submitted Application application_1495998405307_0001 17/05/29 14:47:39 INFO MapReduce. Job:the URL to track the job:http://hadoop:8088/proxy/application_1495998405307_0001/ 17/05/29 14:47:39 INFO MapReduce. Job:running job:job_1495998405307_0001 17/05/29 14:48:00 INFO MapReduce. Job:job job_1495998405307_0001 running in Uber Mode:false 17/05/29 14:48:00 INFO MapReduce. Job:map 0% Reduce 0% |
Enter the management interface (HTTP://HADOOP:8088/CLUSTER/APPS) of HDFs to see how the program works:
26.2 MapReduce Use
MapReduce is a distributed computing programming framework in Hadoop that, as long as it is programmed, only needs to write a small amount of business logic code to implement a powerful mass data concurrency handler.
26.2.1 Demo Development--wordcount
1. Demand
Count the total number of occurrences of each word from a large number of text files (such as T-level)
2, the realization of the idea of MapReduce
Map phase:
A) Read data row by line from the source data file in HDFs
b) Cut each line of data into words
c) construct a key-value pair for each word (word, 1)
d) Send the key-value pair to the reduce
Reduce phase:
A) receive a Word key value pair from the map stage output
b) Aggregation of key-value pairs of the same word into a group
c) For each group, iterate through all the "values" in the group, summing up the total number of occurrences of each word
d) Output (words, total number of times) to the file in HDFs
1, the specific code implementation
(1) Define a mapper class
The first thing to define is the four generic type Keyin:longwritable Valuein:text Keyout:text valueout:intwritable public class Wordcountmapper extends mapper<longwritable, text, text, intwritable>{ The life cycle of the map method: The frame is called once per row of data Key: The offset of the starting point of this line in the file Value: The contents of this line @Override protected void Map (longwritable key, Text value, Context context) throws IOException, Interruptedexception { Get a row of data converted to string String line = value.tostring (); Cut this line out of each word string[] Words = Line.split (""); Traversal array, output < word,1> for (String word:words) { Context.write (New Text (word), new intwritable (1)); } } } |
(2) define a reducer class
Life cycle: The reduce method is called once per KV group passed in the framework @Override protected void reduce (Text key, iterable<intwritable> values, context context) throws IOException, interruptedexception { Define a counter int count = 0; Traverse all V of this set of kv and accumulate to count for (intwritable value:values) { Count + = Value.get (); } Context.write (Key, New Intwritable (count)); } } |
(3) Define a main class to describe the job and submit the job
public class Wordcountrunner { The information about the business logic (which is the Mapper, which is the reducer, where the data is to be processed, where the results of the output are placed ...). ) described as a Job object Submit the described job to the cluster to run public static void Main (string[] args) throws Exception { Configuration conf = new configuration (); Job wcjob = job.getinstance (conf); Specify the jar package where I am the job Wcjob.setjar ("/home/hadoop/wordcount.jar"); Wcjob.setjarbyclass (Wordcountrunner.class); Wcjob.setmapperclass (Wordcountmapper.class); Wcjob.setreducerclass (Wordcountreducer.class); Set the data type of the output key and value of our business logic Mapper class Wcjob.setmapoutputkeyclass (Text.class); Wcjob.setmapoutputvalueclass (Intwritable.class); Set the data type of the output key and value of our business logic reducer class Wcjob.setoutputkeyclass (Text.class); Wcjob.setoutputvalueclass (Intwritable.class); Specify where the data you want to work is located Fileinputformat.setinputpaths (Wcjob, "hdfs://hdp-server01:9000/wordcount/data/big.txt"); Specify where to save the results after processing is complete Fileoutputformat.setoutputpath (Wcjob, New Path ("hdfs://hdp-server01:9000/wordcount/output/")); Submit this job to the yarn cluster Boolean res = Wcjob.waitforcompletion (true); System.exit (res?0:1); } |
26.2.2 Program Packaging Run
1. Package The program
2. Prepare input data
Vi/home/hadoop/test.txt
Hello, Tom. Hello Jim Hello Ketty Hello World Ketty Tom |
Create an input Data folder on HDFs:
Hadoop FS Mkdir-p/wordcount/input
Uploading Words.txt to HDFs
Hadoop fs–put/home/hadoop/words.txt/wordcount/input
3. Upload the program jar package to any server on the cluster
4. Use the command to start the Execute WordCount program jar package
$ Hadoop jar Wordcount.jar Cn.toto.bigdata.mrsimple.wordcountdriver/wordcount/input/wordcount/out
5. View execution Results
$ Hadoop fs–cat/wordcount/out/part-r-00000
Original link http://blog.csdn.net/tototuzuoquan/article/details/72802439
HDFs design ideas, HDFs use, view cluster status, Hdfs,hdfs upload files, HDFS download files, yarn Web management Interface Information view, run a mapreduce program, MapReduce Demo