International - English

Cart Console

Topic Center

Contact Sales

Home > Developer > Go

Golang Web crawler Framework gocolly/colly A

Last Update:2017-12-30 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This is a creation in Article, where the information may have evolved or changed.

Golang web crawler framework gocolly/colly a

Gocolly go github 3400+ star, ranked go version of the crawler program top. gocolly Fast and elegant, on a single core can be initiated every second Span style= "Font-family:calibri" >1k above request; A set of interfaces in the form of a callback function that can implement any type of crawler; dependency goquery library can be like jquery same select element.

Gocolly 's official website is http://go-colly.org/, which provides detailed documentation and sample code. Install colly:

Go get-u github.com/gocolly/colly/...

To import a package in your code:

Import "Github.com/gocolly/colly"

the main body of the colly is the Collector Object, which manages network traffic and is responsible for performing additional rollback functions while the job is running. To use colly , you first initialize the Collector:

c: = colly. Newcollector ()

You can attach a variety of different types of fallback functions to colly to control the collection job or get information. Add back function:

C.onrequest (func (R *colly). Request) {    fmt. Println ("visiting", R.url)}) C.onerror (func (_ *colly. Response, err Error) {    log. Println ("Something went wrong:", Err)}) C.onresponse (func (R *colly). Response) {    fmt. Println ("visited", R.url)}) c.onhtml ("A[href]", func (E *colly. HtmlElement) {    e.request.visit (e.attr ("href")}) c.onhtml ("tr td:nth-of-type (1)", Func (E *colly. HtmlElement) {    fmt. Println ("first column of a table row:", E.text)}) c.onscraped (func (R *colly). Response) {    fmt. Println ("Finished", R.url)})

The return function is called in the following order:

1. ONrequest

Called before the request is initiated

2. OnError

Called when an error occurs during the request

3. Onresponse

Called when a reply is received

4. onhtml

in the Onresponse is called after, if the received content is HTML

5. onscraped

in the Onhtml is called after

officially provided by Basic Sample code:

Package main import (    "FMT"     "github.com/gocolly/colly") func main () {    //instantiate default collector    c: = colly. Newcollector ()     //Visit only domains:hackerspaces.org, wiki.hackerspaces.org    c.alloweddomains = []string{] Hackerspaces.org "," wiki.hackerspaces.org "}     //On every a element which have href attribute call callback    c.onhtml ("A[href]", func (E *colly. HtmlElement) {        Link: = e.attr ("href")        //Print link        fmt. Printf ("Link found:%q-%s\n", E.text, link)        //Visit Link found on page        //Only those links is visited WH Ich is in Alloweddomains        c.visit (E.request.absoluteurl (link))    })     //Before making a Request print " Visiting ... "    C.onrequest (func (R *colly). Request) {        fmt. Println ("Visiting", R.url. String ())    })     //Start scraping on https://hackerspaces.org    c.visit ("https://hackerspaces.org/")}

The instance program accesses only links within the hackerspaces.org domain,onhtml The selector for the function back to A[href], and selects the page with an href property of a type element to continue crawling after the link is found. Some of the results of the run are as follows:

PS e:\mygo\src\github.com\gocolly\colly\_examples\basic>. \basic.exevisiting https://hackerspaces.org/Link found:"Navigation"#column-Onelink found:"Search"-#searchInputLink found:""/File:Cbase07.jpgVisiting https://hackerspaces.org/file:cbase07.jpgLink found:"Navigation"#column-Onelink found:"Search"-#searchInputLink found:"File"#fileLink found:"File History"-#filehistoryLink found:"File Usage"-#filelinksLink found:""/images/e/ec/cbase07.jpgvisiting https://hackerspaces.org/images/e/ec/cbase07.jpgLink found:"800x600 pixels"/images/thumb/e/ec/cbase07.jpg/800px-cbase07.jpgvisiting https://hackerspaces.org/images/thumb/e/ec/cbase07.jpg/800px-cbase07.jpg

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

Go combat--golang Get public IP, view intranet IP, detect IP ... 07-26

Golang client Sarama via SSL connection Kafka configuration 03-20

Golang private Key "encrypt" public key "decrypt" 07-01

Golang in net package usage (i) 06-17

Go Navicat Premium 12.1.8.0 Installation and activation 10-23

Go adobe Creative Cloud 2015 download adobe CC Download 06-17

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Golang Web crawler Framework gocolly/colly A

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support