Training mission to MapReduce get the highest score record of the score table
Training 1: Count the number of users spinning questions
Task Description:
Count the total number of visits per user for each natural day in 2016. The user name and access date are provided in the original data file. This task is to get the cumulative value of all user visits in each natural day unit. If this task is accomplished through MapReduce programming, the first thing to consider is what the respective processing logic of mapper and Reducer is, and then the core code is written according to the processing logic; Finally, the complete code is written in Eclipse, and the package is submitted to the cluster for execution.
Analytical Thinking and logic
(1) input/output format.
The access dates for social networking sites are formatted as text, and the number of accesses is the integer data format. Its composition of the key-value pairs for the < access date, the number of visits, so the output of mapper and reducer output are selected text class and Intwritble class.
(2) Calculation logic to be implemented by mapper
The primary task of the map function is to read the data from the user's access to the file, and to output key-value pairs for all access dates and initial times. < access date, 1 >
(3) Calculation logic to be implemented by reducer
Read Mapper output key value pair < access date, 1>
Example: the User_login.txt access date format is as follows:
The program code is as follows:
Operation Steps:
1. Write the following code: Dailyaccesscount.java, the complete content is as follows.
2. Compile the build Dailyaccesscount.jar.
3. Upload Dailyaccesscount.jar to the Hadoop Cluster server node.
Hdfs Dfs-put/root/hadooptmp/user_login.txt/user/test
4. On the endpoint of the Hadoop cluster Server, submit the task with the Hadoop jar command. The code is as follows
Hadoop jar Dailyaccesscount.jar \
/user/root/user_login.txt \
/user/root/accesscount
Training requirements:
Analyze the above code and compile and run according to the DailyAccessCount.txt program code given. will run the results below
Training 2 get the highest score record of the score table
1. Training Essentials
(1) Mastering the execution process of MapReduce.
(2) Master the basis of the MapReduce program writing.
(3) Master the output format of the MapReduce program.
2. Requirements Description
There is a sample file subject_score that results table A. Each row of data in the file contains two fields, accounts, and fractions. Ask for the record with the highest score for each account in the score list and output the results to the highest score table B.
Part of the results table A:
Chinese |
73 |
Mathematical |
97 |
English |
21st |
Physical |
72 |
Chemical |
49 |
Biological |
69 |
Chinese |
106 |
Mathematical |
112 |
English |
38 |
Top Table B. Content part:
Chinese |
99 |
Mathematical |
149 |
English |
122 |
Physical |
143 |
Chemical |
120 |
3. realization of ideas and steps
(1) In the Mapper class, the MAPI two counts the data in the results table, in the technology will read the data with a space (in the format of the kimono) division, the level of health value to the credits source is set FA health value to the class strong for <Text,IntWitable>
(2) in reducer, due to the number of MP output health value is the Tat hwnhe. So Reise fnw for the same health (that is, account 1, traverse compares its value terable antwiuble receives the key value pair is-tet. Henbie last output health value to account, Top grade >. (i.e. grades), find the highest value (i.e. the highest score).
4 . Training requirements:
Refer to the Training 1 program code, complete the score table to get the highest score record programming, copy the code as follows, the results below. Reference article: 80811698
Training mission to MapReduce get the highest score record of the score table