[Distributed System Learning] 6.824 LEC2 RPC and thread notes

Source: Internet
Author: User
Tags soap

6.824 of the courses are usually prepared for you before class. Usually read a paper first, then ask you to ask a question, and then ask you to answer a question. Then class, and then decorate the lab.

The preparation of the second lesson-crawler

The second lesson is not a paper, it is to let you implement the crawler inside the Go tour. The original implementation in the Go tour is serial and may crawl to the same URL. ask you to parallel and go heavy.

The simple idea is, in order to achieve parallelism, crawl each URL is to use goroutine, in order to achieve the weight, each time the crawl to put the URL into the map.

But here's a point of knowledge, the Crawler function finally returns success, all URLs are crawled, so a mechanism is required to wait for all goroutine to complete. Check it out and you can use sync. Waitgroup. That's an intuitive implementation:

Crawl uses Fetcher to recursively Crawl
Pages starting with URLs, to a maximum of depth.
Func Crawl (URL string, depth int, fetcher fetcher) {
var collector collector;
Collector.fetchedurl = Make (Map[string]bool)
Crawlint (URL, depth, fetcher, &collector)
Collector. Wait ()
}

Type Collector struct {
Sync. Mutex
Sync. Waitgroup
Fetchedurl Map[string]bool
}

Func crawlint (URL string, depth int, fetcher fetcher, collector *collector) {
If depth <= 0 {
Return
}
Collector. Lock ()
If _, OK: = Collector.fetchedurl[url]; OK {
Visited,
Collector. Unlock ()
Return
}
Collector.fetchedurl[url] = True
Collector. Unlock ()
Body, URLs, err: = Fetcher. Fetch (URL)
If err! = Nil {
Fmt. PRINTLN (ERR)
Return
}
Collector. ADD (len)
Fmt. Printf ("Found:%s%q\n", URL, body)
For _, U: = Range URLs {
Go func (U string) {
Crawlint (U, depth-1, fetcher, collector)
Collector. Done ()
} (U)
}
Return
}

But see the answer, think the answer is very concise, not only did not use the Waitgroup, even a lock is not used.

Concurrent crawler with Channels//func dofetch (URL1 string, ch Chan []string, Fetcher fetcher) {body, URLs, err: = FE Tcher. Fetch (URL1) if err! = Nil {fmt. PRINTLN (ERR) ch <-[]string{}} else {fmt. Printf ("Found:%s%q\n", URL1, body) ch <-urls}}func Master (Ch Chan []string, Fetcher fetcher) {n: = 1fetched: = Make (M Ap[string]bool) for URLs: = Range ch {for _, U: = Range URLs {If _, OK: = Fetched[u]; OK = = False {Fetched[u] = Truen + = 1g O dofetch (u, CH, fetcher)}}n-= 1if N = = 0 {break}}}func crawlconcurrentchannel (URL string, fetcher fetcher) {ch: = make (c Han []string) go func () {ch <-[]string{url}} () Master (CH, fetcher)}

The crawler function is the Crawlconcurrentchannel. The inside of CH is an array of pages returned per fetch. Why not go to lock? Because the fetched map is judged and joined in the main thread.

The URLs within CH may, of course, be duplicated, but it has been judged in the main thread not to repeat the fetch.

and by N to determine whether all the pages are crawled. So there is n==sizeof (ch) = = sizeof (fetched). The sizeof here refers to all the put in, not a moment.

RPC for Go

We've already met in the previous lab. It feels a bit like soap, but it's not as complex as soap, and you need to define WSDL.

Send at least once vs send at most once

send at least once : RPC Lib waits for return, if timed out, then sent. So many times to try, always did not return, on the error.

Can this solve the problem? What happens if the deducted balance is sent?

So "send at least once" is valid for read-only operations, and reentrant operations. For example, the map and reduce in our previous lab are reentrant.

send at most once: the problem is how to detect duplicate requests.

The client can send a unique ID (XID) for validating duplicates. The server does the following processing.

Server:    if SEEN[XID]:      r = Old[xid]    else      r = Handler ()      Old[xid] = r      Seen[xid] = True

Here are the questions to be addressed:

1. How does the client guarantee that XID is unique? Now the UUID can be done, in addition to the IP address plus the serial number to do hash value.

2. The server needs to clean up the request at some point, otherwise each request is put into seen map, it will explode. The client can include a "received #<x reply" message in each RPC so that the server can discard them.

3. The server is processing a request, but the new request has come in and the server does not want to do a second time, then he can set up a "pending" flag to allow the new request to wait or ignore.

The RPC policy of the Go language is "send at most once ".

[Distributed System Learning] 6.824 LEC2 RPC and thread notes

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.