Goroutine and channel usage in Go

Source: Internet
Author: User
This is a creation in Article, where the information may have evolved or changed.

As a go novice, the Go Guide was followed by the Go tutorial, and some goroutine related to channel issues were encountered when completing the web crawler exercises on the guide.

The guide gives the original code at the beginning, the most important of which is the crawl function, the code is as follows:

Crawl uses Fetcher to crawl pages recursively from a URL until the maximum depth is reached. Func Crawl (URL string, depth int, fetcher fetcher) {//TODO: Parallel crawl URL. TODO: Do not repeat crawl pages. The        above two cases are not implemented below: If depth <= 0 {return}body, URLs, err: = Fetcher. Fetch (URL) if err! = Nil {fmt. PRINTLN (Err) return}fmt. Printf ("Found:%s%q\n", URL, body) for _, U: = Range URLs {Crawl (U, depth-1, fetcher)}return}
This code is simple enough to recursively crawl all the URLs you get. The topic requires modifying the crawl function to use concurrency and not to fetch pages repeatedly.

The difficulty in accomplishing this is that if you still use recursion to fetch the page, then how does the child goroutine know if a URL has been crawled, and in the beginning, I implemented it this way:

Crawl uses Fetcher to crawl pages recursively from a URL until the maximum depth is reached. Func Crawl (URL string, depth int, fetcher fetcher) {//TODO: Parallel crawl URL. TODO: Do not repeat crawl pages. Ch:=make (chan int) count:=1urlsfetched:=make (map[string]string) Go crawlwithconcurrency (Url,depth,fetcher,ch, urlsfetched) for count>0 {count+=<-ch}return}func crawlwithconcurrency (URL string, depth int, fetcher fetcher, ch Chan int,urlsfetched map[string]string) {if depth<=0 {ch <- -1return}body, URLs, err: = Fetcher. Fetch (URL) urlsfetched[url]= "true" if err! = Nil {fmt. PRINTLN (ERR) ch <- -1return}fmt. Printf ("Found:%s%q\n", URL, body) for _,u:=range URLs {_,exists:=urlsfetched[u]if exists {Continue}go Crawlwithconcurrency (u,depth-1,fetcher,ch,urlsfetched) ch <-1}ch <- -1return}
To put it simply:

1. The main goroutine of the crawl function is executed by count to ensure that all sub-goroutine are ended and run again.

2. All goroutine by sharing urlsfetched memory to ensure that the crawled pages are not duplicated.

Go is implemented in parallel with CSP and advocates the principle of "sharing memory through communication rather than shared memory".

The above code through the shared memory urlsfetched to communicate is against the CSP principle, each goroutine can read and modify urlfetched, such a design is hidden, there may still be repeated crawl pages, so the way to share memory is very dangerous.

In order to comply with the CSP principle, I changed the code and the following code was modified:

Type Result struct {depth inturls []string}//master Goroutine used for control program end func Crawl (URL string, depth int, fetcher fetcher) {Ch:=ma Ke (chan *result) urlsfetched:=make (map[string]string) count:=1urlsfetched[url]= "true" Go Crawlworker (url,depth, FETCHER,CH) for count>0 {result:=<-ch//depth<=1 does not crawlresult urlsif result.depth>1 {for _,u:=range Result.urls{_,exists:=urlsfetched[u]if exists {continue} else {count++urlsfetched[u]= "true" Go crawlworker (U, RESULT.DEPTH-1,FETCHER,CH)}}}count--}return}func crawlworker (URL string, depth int, fetcher Fetcher,ch chan *Result) { Body, URLs, err: = Fetcher. Fetch (URL) if err! = Nil {fmt. PRINTLN (Err) ch<-&result{depth,urls}return}fmt. Printf ("Found:%s%q\n", url, Body) Ch<-&result{depth,urls}}

The main content of the modified code is:

1. No more recursion

2, only let the main goroutine management urlfetched memory

3, the other sub-goroutine return the operation results to the main goroutine, the main goroutine and then start a new sub-goroutine

This adheres to the CSP principle and is more secure.

The complete code is as follows:


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.