Golang Study NOTES: Experience concurrent programming of Go (ii)

Source: Internet
Author: User
This is a creation in Article, where the information may have evolved or changed.

In the Go Guide, the last section of the exercise is a web crawler. The first look at the catalog is really to write a crawler, until the code is read carefully to find out, just let the use of channel and mutex simulation of a crawler code.

The initial state is as follows

Package Mainimport ("FMT") type Fetcher interface {//Fetch returns the body content of the URL and places the URL found on this page into a slice. Fetch (URL string) (body string, URLs []string, err Error)}//Crawl use Fetcher to crawl the page recursively from a URL until the maximum depth has been reached. Func Crawl (URL string, depth int, fetcher fetcher) {//TODO: Parallel crawl URL.        TODO: Do not repeat crawl pages. Neither of the following is true: if depth <= 0 {return}body, URLs, err: = Fetcher. Fetch (URL) if err! = Nil {fmt. PRINTLN (Err) return}fmt. Printf ("Found:%s%q\n", URL, body) for _, U: = Range URLs {Crawl (U, depth-1, Fetcher)}return}func main () {Crawl ("Http://go lang.org/", 4, Fetcher)}//Fakefetcher is a fetcher that returns several results. Type Fakefetcher map[string]*fakeresulttype fakeresult struct {body Stringurls []string}func (f fakefetcher) Fetch (URL St Ring) (String, []string, error) {If res, OK: = F[url]; OK {return res.body, Res.urls, Nil}return "", Nil, FMT. Errorf ("Not Found:%s", url)}//Fetcher is a populated fakefetcher. var fetcher = fakefetcher{"http://golang.org/": &fakeresult{"the Go programming Language", []string{] Http://golang.org/pkg/"," http://golang.org/cmd/",},}," http://golang.org/pkg/": &fakeresult{" Packages ", [] string{"http://golang.org/", "http://golang.org/cmd/", "http://golang.org/pkg/fmt/", "http://golang.org/pkg/os/", },}, "http://golang.org/pkg/fmt/": &fakeresult{"Package Fmt", []string{"http://golang.org/", "http://golang.org /pkg/",},}," http://golang.org/pkg/os/": &fakeresult{" Package OS ", []string{" http://golang.org/","/HTTP/ golang.org/pkg/",},},}

Fetcher has crawled crawling information to death during initialization, and the tutorial is to achieve: parallel crawl and synchronization control

For parallel crawls, my idea is that each layer creates a recursive crawl based on the width of the corresponding thread. A simple implementation is to create a thread recursive execution crawl function in a for loop.

Sync control is aware of using sync as prompted. The mutex makes a semaphore, updating and querying the path table to ensure synchronization.

The implementation code is as follows

Package Mainimport ("FMT" "Sync") type Fetcher interface {//Fetch returns the body content of the URL and places the URL found on this page into a slice. Fetch (URL string) (body string, URLs []string, err Error)}type walk struct {m map[string]intmux sync. mutex}//Crawl uses Fetcher to crawl pages recursively from a URL until the maximum depth is reached. Func Crawl (URL string, depth int, fetcher fetcher) {if depth <= 0 {return}var WG = sync. Waitgroup{}var body stringvar URLs []stringvar err Error//lockhaswalk.mux.lock () If _, OK: = Haswalk.m[url];!ok {Body,urls , Err=fetcher. Fetch (URL) haswalk.m[url] = 1if err! = Nil {fmt. PRINTLN (Err) haswalk.mux.Unlock () return}fmt. Printf ("Found:%s%q\n", url, body)}else{haswalk.mux.unlock () Return}haswalk.mux.unlock () for _, U: = range URLs {// Add a waiting thread WG. ADD (1) Go func (target string,d Int,fet fetcher) {////Recursive crawl crawl (target, D, FET) WG. Done ()} (U,depth-1,fetcher)}WG. Wait () Return}func main () {Crawl ("http://golang.org/", 4, Fetcher)}//Fakefetcher is the fetcher that returns several results. Type Fakefetcher map[string]*fakeresulttype fakeresult struct {body Stringurls []striNg}func (f fakefetcher) Fetch (URL string) (string, []string, error) {If res, OK: = F[url]; OK {return res.body, Res.urls, Nil}return "", Nil, FMT. Errorf ("Not Found:%s", URL)}//fetch path var haswalk = Walk{m:make (map[string]int)}//Fetcher is populated fakefetcher. var fetcher = fakefetcher{"http://golang.org/": &fakeresult{"the Go programming Language", []string{]/HTTP/ golang.org/pkg/"," http://golang.org/cmd/",},}," http://golang.org/pkg/": &fakeresult{" Packages ", []string{] http://golang.org/"," http://golang.org/cmd/"," http://golang.org/pkg/fmt/"," http://golang.org/pkg/os/",},}," http://golang.org/pkg/fmt/": &fakeresult{" Package Fmt ", []string{" http://golang.org/"," http://golang.org/pkg/ ",},}," http://golang.org/pkg/os/": &fakeresult{" Package OS ", []string{" http://golang.org/"," http://golang.org /pkg/",},},}

The completion of this code took half a day (really too slag t_t), of which there are two pits, pit me for a long time.

First, the main thread ends prematurely, causing the entire program to crawl only one layer after exiting.

The workaround is also simple, using Waitgroup to implement the semaphore mechanism, add a semaphore record without creating a new thread, and the current thread will not end until all the created threads have finished executing.

Second, in the process of parallel crawling, the initial writing is

for _, u := range urls {//增加一条等待线程wg.Add(1)go func() {//递归爬取Crawl(u, depth-1, fetcher)wg.Done()}()}

There seems to be no problem, but the execution process finds that the value of the parameter u should be each record in the URLs as a rule, but when actually executed, each time you are the last record in the URLs.

I guess it's supposed to be a sub-thread. When you get the value of U, you have traversed the last record, so each fetch will only be the result of the last record.

Change the wording into

for _, u := range urls {//增加一条等待线程wg.Add(1)go func(target string,d int,fet Fetcher) {//递归爬取Crawl(target, d, fet)wg.Done()}(u,depth-1,fetcher)}wg.Wait()

Once the function of the goroutine is incremented by three parameters, it is normal to pass the parameters to the previous one.

Finally completed the Go Guide in the practice!!! Go grammar and C are quite similar, it is not too difficult to understand. In concurrent programming here, feel your thoughts still have problems, too immature! The process of writing this false crawler found a lot of previous trampled on the pit I was stepping on, all kinds of deadlock, synchronization problems can be avoided in fact ~ ~ ~

Write this code with the three idea of the Goland, in the debugging process found that the breakpoint is only useful to the current thread, there is no way to set up all the threads are suspended t_t, if there is great God know how to set, Kneel!!!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.