Golang crawler crawling watercress movie Top250

Source: Internet
Author: User
This is a creation in Article, where the information may have evolved or changed.

Crawl the Watercress movie Top250

Crawlers are standard, and it's interesting to see the data at that moment. The first one from the most simple and basic crawler began to write it!

Project Address: Https://github.com/go-crawler ...

Goal

Our target site is the Watercress film Top250, it is estimated that everyone looks familiar.

This crawl takes 8 fields for a simple summary analysis. The specific fields are as follows:

Simple analysis of the target source

    • Page Total 25 article
    • with pagination (10 pages total) and pagination rules are normal
    • The data field sort for each item is regular and unchanging

Begin

Due to the small amount, our crawl steps are as follows

    • Analysis page, get all the pagination
    • Analyze the page and cycle through all the page's movie information
    • Crawling of movie information into storage

Installation

$ go get -u github.com/PuerkitoBio/goquery

Run

$ go run main.go

Code Snippets

1. Get all Paging

func ParsePages(doc *goquery.Document) (pages []Page) {    pages = append(pages, Page{Page: 1, Url: ""})    doc.Find("#content > div > div.article > div.paginator > a").Each(func(i int, s *goquery.Selection) {        page, _ := strconv.Atoi(s.Text())        url, _ := s.Attr("href")        pages = append(pages, Page{            Page: page,            Url:  url,        })    })    return pages}

2, analysis of the Watercress film information

func ParseMovies(doc *goquery.Document) (movies []Movie) {    doc.Find("#content > div > div.article > ol > li").Each(func(i int, s *goquery.Selection) {        title := s.Find(".hd a span").Eq(0).Text()        ...        movieDesc := strings.Split(DescInfo[1], "/")        year := strings.TrimSpace(movieDesc[0])        area := strings.TrimSpace(movieDesc[1])        tag := strings.TrimSpace(movieDesc[2])        star := s.Find(".bd .star .rating_num").Text()        comment := strings.TrimSpace(s.Find(".bd .star span").Eq(3).Text())        compile := regexp.MustCompile("[0-9]")        comment = strings.Join(compile.FindAllString(comment, -1), "")        quote := s.Find(".quote .inq").Text()        ...        log.Printf("i: %d, movie: %v", i, movie)        movies = append(movies, movie)    })    return movies}

Data

What do you think of the data that you are looking at, really curious: =)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.