Cloud computing-hadoop two-hour Quick Start Guide-Part 1

Source: Internet
Author: User
Tags gz file

I bought a book, the second version of the hadoop authoritative guide. It's really good to write, that is, my mind is too jumping. I saw two chapters before and after reading them (no way, one of his ideas is to see Appendix A installing hadoop and appendix C preparing ncdc weather data)

Appendix A needs to study and determine based on your own needs. In the learning stage, the local mode is enough. Do not use any cluster mode, waste your feelings and computer resources. You can search for the specific process through the Internet. Many people write it, but what you need is the most basic mode. You only need to install Java, decompress hadoop, and configure the hadoop bin directory to the path environment variable. You can. Cluster, and so on. Don't play those useless things. Let's talk about them later. I believe that you can build a liunx System in 30 minutes, install JDK, decompress hadoop, configure java_home, and configure PATH environment variables.

 

Appendix C: Prepare ncdc weather data. This chapter is a huge crash. I 've watched it for two days (all in my spare time after work) I also downloaded the required test data from the ncdc official website, which I described. After reading the data for a long time, it was originally a test, just look for the two-year data of the two temperature test sites and combine them as the test data.

Next, I will start my understanding of the first two chapters. I hope that later users can quickly learn the content of the second chapter through one night.

1. Preparation (30 minutes) 1) Basic popular HDFS commands

1. view the directory (list) Information

# Bin/hadoop DFS-ls/

 

2. Create the directory Test

# Bin/hadoop DFS-mkdir Test

 

3. Delete the directory test.

# Bin/hadoop DFS-RMR Test

 

4. upload files to the HDFS directory.

# Bin/haddop DFS-put *. txt Test

 

5. Download an object

# Bin/hadoop DFS-get test test1

 

6. view an object

# Bin/hadoop DFS-tail Test

# Bin/hadoop DFS-cat Test

 

2) Prepare weather forecast data

Open ftp

Ftp://ftp.ncdc.noaa.gov/pub/data/gsod/

Analyze the files of the next two years at will.

2011/007018 -99999-2011.op.gz

2010/999999 -94645-2012.op.gz

Upload the two GZ files to the Linux server, decompress the GZ file, and execute the command to merge the two temporary files.

Cat 007018-99999-2011.op 999999-94645-2012.op> ncdc-all.op

Upload the combined temperature file to HDFS for analysis.

Hadoop DFS-put ncdc-all.op./HDFS/Temperature

2. Compile the map-Reduce function and scheduling function (job) (60 minutes)

Let's look at the code. No harm. For more information, see the attachment. Attachments cannot be uploaded. Wait until I upload them to the resource.

Http://download.csdn.net/detail/xzknet/4905094

Oh, by the way, the attachment password is missing. It is the same as the file name. Haha

3. Upload the jar package to the Linux server. (30 minutes)

Package the code written in step 2 into hadooptest. jar and put it in a local directory. For example:

/Home/xzknet/demo/ch02/

Run the following command in the directory:

Hadoop jar./ch02.jar/home/xzknet/demo/ch02/HDFS output

The system returns the following information:

Xzknet @ bogon :~ /Demo/ch02 $ hadoop jar./ch02.jar/home/xzknet/demo/ch02/HDFS output

12/12/1807: 35: 48 info JVM. jv1_rics: initializing JVM metrics with processname = jobtracker, sessionid =

12/12/1807: 35: 48 warn mapred. jobclient: Use genericoptionsparser for parsing thearguments. Applications shocould implement tool for the same.

12/12/1807: 35: 48 info mapred. fileinputformat: total input paths to process: 1

12/12/1807: 35: 48 info mapred. jobclient: running job: job_local_0001

12/12/1807: 35: 48 info mapred. fileinputformat: total input paths to process: 1

12/12/1807: 35: 48 info mapred. maptask: numreducetasks: 1

12/12/1807: 35: 48 info mapred. maptask: Io. Sort. MB = 100

12/12/1807: 35: 48 info mapred. maptask: Data Buffer = 79691776/99614720

12/12/1807: 35: 48 info mapred. maptask: Record Buffer = 262144/327680

12/12/1807: 35: 48 info mapred. maptask: Starting flush of map output

12/12/1807: 35: 49 info mapred. maptask: Finished spill 0

12/12/1807: 35: 49 info mapred. taskrunner: task: attempt_local_0001_m_000000_0 is done. and is in the process of commiting

12/12/1807: 35: 49 info mapred. localjobrunner: file:/home/xzknet/demo/ch02/HDFS/temperature: 0 + 99802

12/12/1807: 35: 49 info mapred. taskrunner: task'attempt _ local_0001_m_000000_0 'done.

12/12/1807: 35: 49 info mapred. localjobrunner:

12/12/1807: 35: 49 info mapred. Merger: merging 1 sorted segments

12/12/1807: 35: 49 info mapred. Merger: Down to the last merge-pass, with 1 segments leftof total size: 7878 bytes

12/12/1807: 35: 49 info mapred. localjobrunner:

12/12/1807: 35: 49 info mapred. taskrunner: task: attempt_local_0001_r_000000_0 is done. and is in the process of commiting

12/12/1807: 35: 49 info mapred. localjobrunner:

12/12/1807: 35: 49 info mapred. taskrunner: Task attempt_local_0001_r_000000_0 is allowedto commit now

12/12/1807: 35: 49 info mapred. fileoutputcommitter: saved output of task 'attempt _ local_0001_r_000000_0 'to file:/home/xzknet/demo/ch02/Output

12/12/1807: 35: 49 info mapred. localjobrunner: reduce> reduce

12/12/1807: 35: 49 info mapred. taskrunner: task'attempt _ local_0001_r_000000_0 'done.

12/12/1807: 35: 49 info mapred. jobclient: Map 100% reduce 100%

12/12/1807: 35: 49 info mapred. jobclient: job complete: job_local_0001

12/12/1807: 35: 49 info mapred. jobclient: counters: 13

12/12/1807: 35: 49 info mapred. jobclient: filesystemcounters

12/12/1807: 35: 49 info mapred. jobclient: file_bytes_read = 244318

12/12/1807: 35: 49 info mapred. jobclient: file_bytes_written = 78028

12/12/1807: 35: 49 info mapred. jobclient: Map-Reduce framework

12/12/1807: 35: 49 info mapred. jobclient: Reduce input groups = 2

12/12/1807: 35: 49 info mapred. jobclient: Combine output records = 0

12/12/1807: 35: 49 info mapred. jobclient: mapinput records = 718

12/12/1807: 35: 49 info mapred. jobclient: reduce shuffle bytes = 0

12/12/1807: 35: 49 info mapred. jobclient: reduce output records = 2

12/12/1807: 35: 49 info mapred. jobclient: spilled records = 1432

12/12/1807: 35: 49 info mapred. jobclient: mapoutput bytes = 6444

12/12/1807: 35: 49 info mapred. jobclient: mapinput bytes = 99802

12/12/1807: 35: 49 info mapred. jobclient: Combine input records = 0

12/12/1807: 35: 49 info mapred. jobclient: mapoutput records = 716

12/12/1807: 35: 49 info mapred. jobclient: Reduce input records = 716

View the current directory, generate the output directory, open this directory, you can see the generated result file, we will see this directory contains the CRC file, is a HDFS folder.

We can open this result file directly, and we can see the highest temperature in two years.

Xzknet @ bogon :~ /Demo/ch02 $ ls

Ch02.jar HDFS ncdc-all.op output

Xzknet @ bogon :~ /Demo/ch02 $ Cd output

Xzknet @ bogon :~ /Demo/ch02/output $ ls

Part-00000

Xzknet @ bogon :~ /Demo/ch02/output $ ls

Part-00000

Xzknet @ bogon :~ /Demo/ch02/output $ tail part-00000

2010 85.4

2012 73.7

Note: When exporting a jar package from eclipse, the export option contains the mainclass option. Set this option. Otherwise, when running the hadoop jar command ***. the mainclass name including the package path must be specified after jar.

 

It is not the same as the command in the book, but it refers to the local method,

Here ch02.jar is local and the temperature of the data file to be analyzed is on HDFS. The output is on HDFS, and the output is a folder.

Xzknet @ bogon :~ /Demo/ch02 $ hadoop DFS-cat./output/part-00000

2010 85.4

2012 73.7

Xzknet @ bogon :~ /Demo/ch02 $ hadoop DFS-tail./output/part-00000

2010 85.4

2012 73.7

Congratulations, you are getting started. I developed my first cloud computing program with hadoop.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.