Golang Study NOTES: Experience concurrent programming of Go (ii)

Last Update:2018-01-08 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This is a creation in Article, where the information may have evolved or changed.

In the Go Guide, the last section of the exercise is a web crawler. The first look at the catalog is really to write a crawler, until the code is read carefully to find out, just let the use of channel and mutex simulation of a crawler code.

The initial state is as follows

Package Mainimport ("FMT") type Fetcher interface {//Fetch returns the body content of the URL and places the URL found on this page into a slice. Fetch (URL string) (body string, URLs []string, err Error)}//Crawl use Fetcher to crawl the page recursively from a URL until the maximum depth has been reached. Func Crawl (URL string, depth int, fetcher fetcher) {//TODO: Parallel crawl URL.        TODO: Do not repeat crawl pages. Neither of the following is true: if depth <= 0 {return}body, URLs, err: = Fetcher. Fetch (URL) if err! = Nil {fmt. PRINTLN (Err) return}fmt. Printf ("Found:%s%q\n", URL, body) for _, U: = Range URLs {Crawl (U, depth-1, Fetcher)}return}func main () {Crawl ("Http://go lang.org/", 4, Fetcher)}//Fakefetcher is a fetcher that returns several results. Type Fakefetcher map[string]*fakeresulttype fakeresult struct {body Stringurls []string}func (f fakefetcher) Fetch (URL St Ring) (String, []string, error) {If res, OK: = F[url]; OK {return res.body, Res.urls, Nil}return "", Nil, FMT. Errorf ("Not Found:%s", url)}//Fetcher is a populated fakefetcher. var fetcher = fakefetcher{"http://golang.org/": &fakeresult{"the Go programming Language", []string{] Http://golang.org/pkg/"," http://golang.org/cmd/",},}," http://golang.org/pkg/": &fakeresult{" Packages ", [] string{"http://golang.org/", "http://golang.org/cmd/", "http://golang.org/pkg/fmt/", "http://golang.org/pkg/os/", },}, "http://golang.org/pkg/fmt/": &fakeresult{"Package Fmt", []string{"http://golang.org/", "http://golang.org /pkg/",},}," http://golang.org/pkg/os/": &fakeresult{" Package OS ", []string{" http://golang.org/","/HTTP/ golang.org/pkg/",},},}

Fetcher has crawled crawling information to death during initialization, and the tutorial is to achieve: parallel crawl and synchronization control

For parallel crawls, my idea is that each layer creates a recursive crawl based on the width of the corresponding thread. A simple implementation is to create a thread recursive execution crawl function in a for loop.

Sync control is aware of using sync as prompted. The mutex makes a semaphore, updating and querying the path table to ensure synchronization.

The implementation code is as follows

Package Mainimport ("FMT" "Sync") type Fetcher interface {//Fetch returns the body content of the URL and places the URL found on this page into a slice. Fetch (URL string) (body string, URLs []string, err Error)}type walk struct {m map[string]intmux sync. mutex}//Crawl uses Fetcher to crawl pages recursively from a URL until the maximum depth is reached. Func Crawl (URL string, depth int, fetcher fetcher) {if depth <= 0 {return}var WG = sync. Waitgroup{}var body stringvar URLs []stringvar err Error//lockhaswalk.mux.lock () If _, OK: = Haswalk.m[url];!ok {Body,urls , Err=fetcher. Fetch (URL) haswalk.m[url] = 1if err! = Nil {fmt. PRINTLN (Err) haswalk.mux.Unlock () return}fmt. Printf ("Found:%s%q\n", url, body)}else{haswalk.mux.unlock () Return}haswalk.mux.unlock () for _, U: = range URLs {// Add a waiting thread WG. ADD (1) Go func (target string,d Int,fet fetcher) {////Recursive crawl crawl (target, D, FET) WG. Done ()} (U,depth-1,fetcher)}WG. Wait () Return}func main () {Crawl ("http://golang.org/", 4, Fetcher)}//Fakefetcher is the fetcher that returns several results. Type Fakefetcher map[string]*fakeresulttype fakeresult struct {body Stringurls []striNg}func (f fakefetcher) Fetch (URL string) (string, []string, error) {If res, OK: = F[url]; OK {return res.body, Res.urls, Nil}return "", Nil, FMT. Errorf ("Not Found:%s", URL)}//fetch path var haswalk = Walk{m:make (map[string]int)}//Fetcher is populated fakefetcher. var fetcher = fakefetcher{"http://golang.org/": &fakeresult{"the Go programming Language", []string{]/HTTP/ golang.org/pkg/"," http://golang.org/cmd/",},}," http://golang.org/pkg/": &fakeresult{" Packages ", []string{] http://golang.org/"," http://golang.org/cmd/"," http://golang.org/pkg/fmt/"," http://golang.org/pkg/os/",},}," http://golang.org/pkg/fmt/": &fakeresult{" Package Fmt ", []string{" http://golang.org/"," http://golang.org/pkg/ ",},}," http://golang.org/pkg/os/": &fakeresult{" Package OS ", []string{" http://golang.org/"," http://golang.org /pkg/",},},}

The completion of this code took half a day (really too slag t_t), of which there are two pits, pit me for a long time.

First, the main thread ends prematurely, causing the entire program to crawl only one layer after exiting.

The workaround is also simple, using Waitgroup to implement the semaphore mechanism, add a semaphore record without creating a new thread, and the current thread will not end until all the created threads have finished executing.

Second, in the process of parallel crawling, the initial writing is

for _, u := range urls {//增加一条等待线程wg.Add(1)go func() {//递归爬取Crawl(u, depth-1, fetcher)wg.Done()}()}

There seems to be no problem, but the execution process finds that the value of the parameter u should be each record in the URLs as a rule, but when actually executed, each time you are the last record in the URLs.

I guess it's supposed to be a sub-thread. When you get the value of U, you have traversed the last record, so each fetch will only be the result of the last record.

Change the wording into

for _, u := range urls {//增加一条等待线程wg.Add(1)go func(target string,d int,fet Fetcher) {//递归爬取Crawl(target, d, fet)wg.Done()}(u,depth-1,fetcher)}wg.Wait()

Once the function of the goroutine is incremented by three parameters, it is normal to pass the parameters to the previous one.

Finally completed the Go Guide in the practice!!! Go grammar and C are quite similar, it is not too difficult to understand. In concurrent programming here, feel your thoughts still have problems, too immature! The process of writing this false crawler found a lot of previous trampled on the pit I was stepping on, all kinds of deadlock, synchronization problems can be avoided in fact ~ ~ ~

Write this code with the three idea of the Goland, in the debugging process found that the breakpoint is only useful to the current thread, there is no way to set up all the threads are suspended t_t, if there is great God know how to set, Kneel!!!

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Golang Study NOTES: Experience concurrent programming of Go (ii)

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support