I bought a book, the second version of the hadoop authoritative guide. It's really good to write, that is, my mind is too jumping. I saw two chapters before and after reading them (no way, one of his ideas is to see Appendix A installing hadoop and appendix C preparing ncdc weather data)
Appendix A needs to study and determine based on your own needs. In the learning stage, the local mode is enough. Do not use any cluster mode, waste your feelings and computer resources. You can search for the specific process through the Internet. Many people write it, but what you need is the most basic mode. You only need to install Java, decompress hadoop, and configure the hadoop bin directory to the path environment variable. You can. Cluster, and so on. Don't play those useless things. Let's talk about them later. I believe that you can build a liunx System in 30 minutes, install JDK, decompress hadoop, configure java_home, and configure PATH environment variables.
Appendix C: Prepare ncdc weather data. This chapter is a huge crash. I 've watched it for two days (all in my spare time after work) I also downloaded the required test data from the ncdc official website, which I described. After reading the data for a long time, it was originally a test, just look for the two-year data of the two temperature test sites and combine them as the test data.
Next, I will start my understanding of the first two chapters. I hope that later users can quickly learn the content of the second chapter through one night.
1. Preparation (30 minutes) 1) Basic popular HDFS commands
1. view the directory (list) Information
# Bin/hadoop DFS-ls/
2. Create the directory Test
# Bin/hadoop DFS-mkdir Test
3. Delete the directory test.
# Bin/hadoop DFS-RMR Test
4. upload files to the HDFS directory.
# Bin/haddop DFS-put *. txt Test
5. Download an object
# Bin/hadoop DFS-get test test1
6. view an object
# Bin/hadoop DFS-tail Test
# Bin/hadoop DFS-cat Test
2) Prepare weather forecast data
Open ftp
Ftp://ftp.ncdc.noaa.gov/pub/data/gsod/
Analyze the files of the next two years at will.
2011/007018 -99999-2011.op.gz
2010/999999 -94645-2012.op.gz
Upload the two GZ files to the Linux server, decompress the GZ file, and execute the command to merge the two temporary files.
Cat 007018-99999-2011.op 999999-94645-2012.op> ncdc-all.op
Upload the combined temperature file to HDFS for analysis.
Hadoop DFS-put ncdc-all.op./HDFS/Temperature
2. Compile the map-Reduce function and scheduling function (job) (60 minutes)
Let's look at the code. No harm. For more information, see the attachment. Attachments cannot be uploaded. Wait until I upload them to the resource.
Http://download.csdn.net/detail/xzknet/4905094
Oh, by the way, the attachment password is missing. It is the same as the file name. Haha
3. Upload the jar package to the Linux server. (30 minutes)
Package the code written in step 2 into hadooptest. jar and put it in a local directory. For example:
/Home/xzknet/demo/ch02/
Run the following command in the directory:
Hadoop jar./ch02.jar/home/xzknet/demo/ch02/HDFS output
The system returns the following information:
Xzknet @ bogon :~ /Demo/ch02 $ hadoop jar./ch02.jar/home/xzknet/demo/ch02/HDFS output
12/12/1807: 35: 48 info JVM. jv1_rics: initializing JVM metrics with processname = jobtracker, sessionid =
12/12/1807: 35: 48 warn mapred. jobclient: Use genericoptionsparser for parsing thearguments. Applications shocould implement tool for the same.
12/12/1807: 35: 48 info mapred. fileinputformat: total input paths to process: 1
12/12/1807: 35: 48 info mapred. jobclient: running job: job_local_0001
12/12/1807: 35: 48 info mapred. fileinputformat: total input paths to process: 1
12/12/1807: 35: 48 info mapred. maptask: numreducetasks: 1
12/12/1807: 35: 48 info mapred. maptask: Io. Sort. MB = 100
12/12/1807: 35: 48 info mapred. maptask: Data Buffer = 79691776/99614720
12/12/1807: 35: 48 info mapred. maptask: Record Buffer = 262144/327680
12/12/1807: 35: 48 info mapred. maptask: Starting flush of map output
12/12/1807: 35: 49 info mapred. maptask: Finished spill 0
12/12/1807: 35: 49 info mapred. taskrunner: task: attempt_local_0001_m_000000_0 is done. and is in the process of commiting
12/12/1807: 35: 49 info mapred. localjobrunner: file:/home/xzknet/demo/ch02/HDFS/temperature: 0 + 99802
12/12/1807: 35: 49 info mapred. taskrunner: task'attempt _ local_0001_m_000000_0 'done.
12/12/1807: 35: 49 info mapred. localjobrunner:
12/12/1807: 35: 49 info mapred. Merger: merging 1 sorted segments
12/12/1807: 35: 49 info mapred. Merger: Down to the last merge-pass, with 1 segments leftof total size: 7878 bytes
12/12/1807: 35: 49 info mapred. localjobrunner:
12/12/1807: 35: 49 info mapred. taskrunner: task: attempt_local_0001_r_000000_0 is done. and is in the process of commiting
12/12/1807: 35: 49 info mapred. localjobrunner:
12/12/1807: 35: 49 info mapred. taskrunner: Task attempt_local_0001_r_000000_0 is allowedto commit now
12/12/1807: 35: 49 info mapred. fileoutputcommitter: saved output of task 'attempt _ local_0001_r_000000_0 'to file:/home/xzknet/demo/ch02/Output
12/12/1807: 35: 49 info mapred. localjobrunner: reduce> reduce
12/12/1807: 35: 49 info mapred. taskrunner: task'attempt _ local_0001_r_000000_0 'done.
12/12/1807: 35: 49 info mapred. jobclient: Map 100% reduce 100%
12/12/1807: 35: 49 info mapred. jobclient: job complete: job_local_0001
12/12/1807: 35: 49 info mapred. jobclient: counters: 13
12/12/1807: 35: 49 info mapred. jobclient: filesystemcounters
12/12/1807: 35: 49 info mapred. jobclient: file_bytes_read = 244318
12/12/1807: 35: 49 info mapred. jobclient: file_bytes_written = 78028
12/12/1807: 35: 49 info mapred. jobclient: Map-Reduce framework
12/12/1807: 35: 49 info mapred. jobclient: Reduce input groups = 2
12/12/1807: 35: 49 info mapred. jobclient: Combine output records = 0
12/12/1807: 35: 49 info mapred. jobclient: mapinput records = 718
12/12/1807: 35: 49 info mapred. jobclient: reduce shuffle bytes = 0
12/12/1807: 35: 49 info mapred. jobclient: reduce output records = 2
12/12/1807: 35: 49 info mapred. jobclient: spilled records = 1432
12/12/1807: 35: 49 info mapred. jobclient: mapoutput bytes = 6444
12/12/1807: 35: 49 info mapred. jobclient: mapinput bytes = 99802
12/12/1807: 35: 49 info mapred. jobclient: Combine input records = 0
12/12/1807: 35: 49 info mapred. jobclient: mapoutput records = 716
12/12/1807: 35: 49 info mapred. jobclient: Reduce input records = 716
View the current directory, generate the output directory, open this directory, you can see the generated result file, we will see this directory contains the CRC file, is a HDFS folder.
We can open this result file directly, and we can see the highest temperature in two years.
Xzknet @ bogon :~ /Demo/ch02 $ ls
Ch02.jar HDFS ncdc-all.op output
Xzknet @ bogon :~ /Demo/ch02 $ Cd output
Xzknet @ bogon :~ /Demo/ch02/output $ ls
Part-00000
Xzknet @ bogon :~ /Demo/ch02/output $ ls
Part-00000
Xzknet @ bogon :~ /Demo/ch02/output $ tail part-00000
2010 85.4
2012 73.7
Note: When exporting a jar package from eclipse, the export option contains the mainclass option. Set this option. Otherwise, when running the hadoop jar command ***. the mainclass name including the package path must be specified after jar.
It is not the same as the command in the book, but it refers to the local method,
Here ch02.jar is local and the temperature of the data file to be analyzed is on HDFS. The output is on HDFS, and the output is a folder.
Xzknet @ bogon :~ /Demo/ch02 $ hadoop DFS-cat./output/part-00000
2010 85.4
2012 73.7
Xzknet @ bogon :~ /Demo/ch02 $ hadoop DFS-tail./output/part-00000
2010 85.4
2012 73.7
Congratulations, you are getting started. I developed my first cloud computing program with hadoop.