Training mission to MapReduce get the highest score record of the score table

Source: Internet
Author: User
Tags hdfs dfs

Training mission to MapReduce get the highest score record of the score table

Training 1: Count the number of users spinning questions

Task Description:

Count the total number of visits per user for each natural day in 2016. The user name and access date are provided in the original data file. This task is to get the cumulative value of all user visits in each natural day unit. If this task is accomplished through MapReduce programming, the first thing to consider is what the respective processing logic of mapper and Reducer is, and then the core code is written according to the processing logic; Finally, the complete code is written in Eclipse, and the package is submitted to the cluster for execution.

Analytical Thinking and logic

(1) input/output format.

The access dates for social networking sites are formatted as text, and the number of accesses is the integer data format. Its composition of the key-value pairs for the < access date, the number of visits, so the output of mapper and reducer output are selected text class and Intwritble class.

(2) Calculation logic to be implemented by mapper

The primary task of the map function is to read the data from the user's access to the file, and to output key-value pairs for all access dates and initial times. < access date, 1 >

(3) Calculation logic to be implemented by reducer

Read Mapper output key value pair < access date, 1>

Example: the User_login.txt access date format is as follows:

The program code is as follows:

Operation Steps:

1. Write the following code: Dailyaccesscount.java, the complete content is as follows.

2. Compile the build Dailyaccesscount.jar.

3. Upload Dailyaccesscount.jar to the Hadoop Cluster server node.

Hdfs Dfs-put/root/hadooptmp/user_login.txt/user/test

4. On the endpoint of the Hadoop cluster Server, submit the task with the Hadoop jar command. The code is as follows

Hadoop jar Dailyaccesscount.jar \

/user/root/user_login.txt \

/user/root/accesscount

Training requirements:

Analyze the above code and compile and run according to the DailyAccessCount.txt program code given. will run the results below

Training 2 get the highest score record of the score table

1. Training Essentials

(1) Mastering the execution process of MapReduce.

(2) Master the basis of the MapReduce program writing.

(3) Master the output format of the MapReduce program.

2. Requirements Description

There is a sample file subject_score that results table A. Each row of data in the file contains two fields, accounts, and fractions. Ask for the record with the highest score for each account in the score list and output the results to the highest score table B.

Part of the results table A:

Chinese

73

Mathematical

97

English

21st

Physical

72

Chemical

49

Biological

69

Chinese

106

Mathematical

112

English

38

Top Table B. Content part:

Chinese

99

Mathematical

149

English

122

Physical

143

Chemical

120

3. realization of ideas and steps

(1) In the Mapper class, the MAPI two counts the data in the results table, in the technology will read the data with a space (in the format of the kimono) division, the level of health value to the credits source is set FA health value to the class strong for <Text,IntWitable>

(2) in reducer, due to the number of MP output health value is the Tat hwnhe. So Reise fnw for the same health (that is, account 1, traverse compares its value terable antwiuble receives the key value pair is-tet. Henbie last output health value to account, Top grade >. (i.e. grades), find the highest value (i.e. the highest score).

4 . Training requirements:

Refer to the Training 1 program code, complete the score table to get the highest score record programming, copy the code as follows, the results below. Reference article: 80811698

Training mission to MapReduce get the highest score record of the score table

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.