PHP crawl Taobao store grade, rating ____php

Source: Internet
Author: User

To crawl Taobao store content can only be passed over the Taobao URL to crawl. So we need to have a URL first.


After you have the URL, you can start to crawl work. According to the URL of the domain name different need to divide the URL into two parts, one is Taobao shop, one is the shop of the cat. Intercept the domain name of the URL here is not said, we will not be their own Baidu. This is because Taobao and the cat shop DOM structure is not the same.


First of all, the simple day cat.


The day Cat shop rank in a class name is tm-shiop-age-content, therefore uses the Phpquery PQ ('. Tm-shiop-age-content ')->text () can directly obtain the day cat shop the rank.


Then there is the grading of the store.


The day Cat store score is in a class for Main-info Div, so it is also the direct PQ ('. Main-info ')->text () on it, the returned data will be "description 4.8 Service 4.7 Logistics 4.7" such data.


Below said the more troublesome Taobao.

First of all Taobao shop level divided into a variety of, such as crown, diamond and so on, and these levels in front of the number. First look at Taobao store level can be found on the page where to find:


Level information here, the first and third classes are fixed, and the second class, which is in the form of tb-rank-xxx, is the corresponding relationship:

Crown: Golden Crown

Cap: Crown

Blue: Diamond

Red: Hearts


So we have to according to the different values of this class to get the level of Taobao store.

PQ ('. Shop-rank. Rank-icon-v2 ')->attr (' class '), we take the value of this class first. Note that from its upper level, Shiop-rank, because the business shop level on the page appeared two times, and we only need one is enough.

The next step is to get the number before this level.


The number of I in this a label corresponds to the number of the corresponding rank. So let's get the I tag in the a tag too: PQ ('. Shop-rank. Rank-icon-v2 ')->html ().

With these two things we're OK, write a function to handle both, and return the value of the store level that we can read:


Trimall function is to replace the crawl to the space in the HTML page, my previous crawl Baidu data blog has this function, here will not be posted again.

And finally we'll get this data: 5 crowns.


Then is to grab Taobao store rating.


Taobao Store rating in a class of MINI-DSR a label, we directly get the text of this tag: PQ ('. MINI-DSR ')->text ().

Crawl down will find that become garbled. This is because the page code taobao is GBK, and phpquery only know GBK2312 do not know GBK, will help us automatically converted to iso-8859-1 encoding, so also need to talk about the data capture to convert the code.

First convert the data to UTF8, and then transfer GBK2312 OK. $str = mb_convert_encoding ($str, ' iso-8859-1 ', ' utf-8 ');

$str = mb_convert_encoding ($str, ' utf-8 ', ' GBK ');

So we get the same grading data as the cat above.


This is the way I crawl and the problems encountered, welcome other small partners have a better way to discuss together.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.