International - English

Cart Console

Topic Center

Contact Sales

Home > Internet > Online Trends

Mapreduce-hadoop authoritative Guide serial

Last Update:2015-03-17 Source: Internet

Author: User

Keywords . GZ these written

Tags .gz analysis code data data processing example files guide

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

"Book serial" MapReduce is a kind of programming model that can be used in data processing. The model is simple, but it is not easy to write useful programs. Hadoop can run MapReduce programs written in various languages. In this chapter, we'll see the same program written in Java, Ruby, Python, and C + + languages. Most importantly, the MapReduce program is essentially running in parallel, so large data analysis tasks can be delegated to any operator with enough machines. The advantage of MapReduce is to deal with large datasets, so let's look at a dataset first.

A meteorological dataset

In our case, we want to write a program to excavate meteorological data. Many meteorological sensors, all over the world, collect meteorological data every hour and get a lot of log data. Because the data is semi-structured and stored in a documented manner, it is ideal for using mapreduce to process.

The format of the data

We will use the data provided by the National Climate Data Center (climatic, NCDC, http://www.ncdc.noaa.gov/). The data is stored in line and ASCII-encoded, with each row being a record. The storage format can support many meteorological elements, many of which can optionally be included in the collection range or the storage length required for its data is variable. For simplicity's sake, we'll focus on some basic elements (such as temperature), which are always fixed in length.

Example 2-1 shows a row of sampled data, where important fields are highlighted. The row data has been divided into rows to highlight each field, and in the actual file, the fields are consolidated into one line with no delimiters.

Example 2-1. Format of data records for national climate Data centres

0057 332130 # USAF Weather redevelop identifier

99999 # Wban Weather redevelop identifier

19500101 # Observation Date

0300 # observation Time

4 +51317 # Latitude (degrees x 1000)

+028783 # Longitude (degrees x 1000) F

M-12

+0171 # elevation (meters)

99999

V020

# Wind Direction (degrees)

1 # Quality Code

0072

00450 # Sky Ceiling height (meters)

1 # Quality Code

010000 # Visibility Distance (meters)

1 # Quality Code

9-0128 # Air Temperature (degrees Celsius x 10)

1 # Quality code-0139

# Dew point temperature (degrees Celsius x 10)

1 # Quality Code 10268

# Atmospheric pressure (hectopascals x 10)

1 # Quality Code

Data files are organized by date and weather stations. From 1901 to 2001, there is a catalogue for each year, each containing a package of meteorological data from each meteorological station and its documentation. For example, the 1999 corresponding folder contains the following records:

% ls raw/1990 | Head

010010-99999-1990.gz

010014-99999-1990.gz

010015-99999-1990.gz

010016-99999-1990.gz

010017-99999-1990.gz

010030-99999-1990.gz

010040-99999-1990.gz

010080-99999-1990.gz

010100-99999-1990.gz

010150-99999-1990.gz

Because there are thousands of meteorological stations, the entire dataset consists of a large number of small-capacity files. Typically, it is easier and more efficient to process a small number of large files, so these data need to be preprocessed to stitch each year's data files into a single file. See Appendix C for specific practices.

1234567 Next

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

apache hadoop mapreduce hadoop installation guide serial serial python hadoop mapreduce download java hadoop mapreduce apache hadoop mapreduce architecture hadoop mapreduce java

Getting Started with CDN 12-02

Front-end Must Learn: CDN Acceleration Principle 12-02

Elements of CDN Network 12-01

Understand the Principle of CDN Acceleration in One Article 12-01

Cloud Security Issues Derived from the Development of Cloud C... 11-26

8 New Types of Attacks Facing the Cloud Environment 11-26

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

Hot Article

Hot Tags

computing conference access forum computer class data get http html applications

Popular Keywords

html add blank space register business logo register ssl certificate full site sign in sign up node js build cloud register register a subdomain in python network management system tutorial how to learn computer science by myself

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Mapreduce-hadoop authoritative Guide serial

Contact Us

Hot Article

Hot Tags

Popular Keywords

Recommend Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support