This is a creation in Article, where the information may have evolved or changed.
This semester selected "Introduction to Distributed Systems", the experimental part and the MIT 6.824 Spring 20,151, please poke, plus for me very difficult pre-class reading and after-school assignments, the pressure is huge: (
The experiment is divided into 5 parts, according to the TA's previous grading, the difficulty is linear increment.
Since the code provided by MIT is written by the go language, before embarking on the experiment, you need to know about go, I'm using an INTRODUCTION-to- programming in Go link, a thin booklet that can quickly understand the basic syntax of the go language.
6.824 Lab 1:mapreduce
Experimental requirements, part 1 needs to complete a complete word count program in the map function and the reduce function, the first step is of course read the code provided, the Mapreduce.go in the Runsingle () process to understand, Knowing what the map function and the input and output of the reduce function are, Part 1 basically does.
I drew it. Runsingle () Simple flowchart, take nmap=3,nreduce=2 as an example, enter a text file, and finally output the word count result.
(Note the difference between the blue and black lines of the reduce () stage, which is a trick I really like in Word count)
The original MapReduce is not very mysterious. Of course, this is just a single-threaded sequential execution of Word Count.
Part ii:distributing MapReduce Jobs
This part uses the single-machine multithreaded simulation distributed MapReduce, mainly completes the mapreduce/master.go in the Runmaster () function. This part needs to use the Goroutine and the channel, this is also the go language powerful place, has the goroutine and the channel, the multi-threaded threshold no longer unattainable (at least for me),
The Word count process is the same as part 1:split, map, reduce, Merge, except that the Map and reduce phases at this time require runmaster () to be assigned to each worker, as to how the job is assigned, fault Tolerance is the question we need to consider. My train of thought was mainly Modify the MapReduce struct to keep track of any additional state (e.g., the set of available workers) Words, so I constructed a availableworkers in the struct mapreduce to record the idle workers at some point, and for each job (Mapjob or Reducejob), Master either assigns it to a worker that is idle in availableworkers, or is assigned to a newly registered worker in Mr.registerchannel.
As for Part 3, when I was testing the portion 2 code, I found parts 3 also passed the test. 冏 ...
In summary , although part 2, 3 code passed the test, which still has a key place to be improved, for a job, master when the decision is assigned to Availableworker or RegisterChannel, Note where Availableworker is the map[string] *workerinfo type, and RegisterChannel is the channel type, this ... In order to solve this problem, I had to add a time lock =.= always felt that this method is not beautiful ... Wait for me to chew go grammar and see if there is any good way to do it.
Note: Collaboration policy explicitly mentions "Do not publish your code or made it available to the future 6.824 students–for Example. Your code visible on GitHub. So, I don't share code anymore.