Golang crawler crawl Autohome used car product Library

Source: Internet
Author: User
This is a creation in Article, where the information may have evolved or changed.

Crawl Autohome used Car product Library

Project Address: Https://github.com/go-crawler ...

Goal

Recently, people often mention autohome in their ears, but also curious about the domestic prices of used cars, so this time the target site is Autohome's Used car product library

To analyze the target source:

    • Page Total 24 article
    • With paging, but this old product library, after 100 pages there will be a problem, so we crawl 99 pages
    • Access to all cities
    • 19w+ data can be crawled together

Begin

Crawl steps

    • Get all the city
    • Assemble all city URLs into the queue
    • Analyzing the structure of a Used car page
    • Next page URL into queue
    • Recycle used car data that pulls all the pages out
    • Recycling of used car data in a city in a circular pull queue
    • Wait to determine that there are no new URLs in the queue
    • Crawling of used cars data warehousing

Get the city

With the page view, you can find a list of all used-car cities in the city screening area, but you check the code carefully. Will find it is JS loaded in, the city is also unified in a variable

There are two methods of extracting

    • Analyze JS variables and extract them
    • areaJsoncopy it directly as a variable resolution

Here we can directly copy and paste it, because this is a relatively small change in value

Get pagination

Through the analysis page can be learned that the paging link has a certain regularity, for example: /2sc/hangzhou/a0_0msdgscncgpi1ltocsp2exb4/ , can be found sp%d , sp followed by the page number

It's common sense that you can go routine quickly pull a wave by predicting all the paging links and pushing into the queue

But in this old product inventory in one problem, after more than 100 pages, the next page is always 101 pages

Therefore, we take a more traditional approach, by pulling down a page of links to access, in order to adapt to possible changes in paging links; 100 pages after the page show is also very strange, first of all ignore

Get data for Used cars

The page structure is more fixed, the regular cleaning HTML can

  func getcars (Doc *goquery. Document) (Cars []qccar) {cityname: = Getcityname (Doc) doc. Find (". Piclist ul Li:not (. Line)"). Each (func (i int, selection *goquery. Selection) {title: = Selection. Find (". Title a"). Text () Price: = Selection. Find (". detail. Detail-r"). Find (". Colf8"). Text () Kilometer: = Selection. Find (". detail. Detail-l"). Find ("P"). Eq (0). Text () Year: = Selection. Find (". detail. Detail-l"). Find ("P"). Eq (1). Text () kilometer = strings. Join (Compilenumber.findallstring (kilometer,-1), "") Year = strings. Join (Compilenumber.findallstring (strings). Trimspace (year),-1), "") prices, _: = StrConv. Parsefloat (price, kilometers), _: = StrConv. Parsefloat (kilometer, years), _: = StrConv. Atoi (year) cars = append (Cars, qccar{cityname:cityname, Title:title, Price:pri CeS, Kilometer:kilometers, Year:years,})) return cars}  

Data

In each city's average price comparison, we can find the North Canton Deep in Beijing, Shanghai, Shenzhen are on the list, and in recent years the momentum of the more powerful Hangzhou directly occupied the top, and the last few have some distance

While other cities are generally a cascade of downward trend, it seems that the first-tier city of used cars is not cheap, of course, it is only the average price

We can see the comparison between the price and the number of kilometers, Shanghai, Chengdu, Zhengzhou, the difference is a little big, feel the need to do in the price and the number of kilometers to do a measure

The picture is a bit interesting, roughly counting the total number of kilometres. In the previous pictures, the average price ranking is not here, but Hohhot, Daqing, Zhongshan and other appeared in the top

Whether the side reaction of the first-tier city vehicle replacement is faster, and the later city of the vehicle is slow, the number of kilometers are basically the leverage

Through the analysis of the title, it can be learned that the name of the vehicle product library is the brand name + Auto/manual +xxxx + attributes, see the title can know an overview

Reference

Crawler Project Address

    • Https://github.com/go-crawler ...
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.