Python Crawler Starter instance (crawl top app apk in Xiaomi store)

Last Update:2017-07-06 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

One, what is a reptile?

A crawler is a tool for acquiring various resources and data on the network. Specific can be self-Baidu.

Second, how to write a simple crawler

1. Get Web Content

It is possible to download the content of the Web page by using Python (3.x) urllib. It's easy to implement.

Import Urllib.requesturl="http://www.baidu.com"response=  Urllib.request.urlopen (URL) html_content=response.read ()

You can also use the three-party library requests, the implementation is also very convenient, before use of course you need to install this library: Pip Install requests (Python 3 after the PIP very good)

Import requestshtml_content=requests.get (URL). text

2. Parsing Web content

The content of the Web page html_content, in fact, is HTML code, we need to parse it, get what we need.

There are many ways to parse a webpage, here I introduce is beautifullsoup, because this is a three-way library, before using or install first: Pip install BS4

form BS4 Imort beautifullsoupsoup " Html.parser ")

Third, the case analysis

The best way to understand the principle of reptiles is to analyze a number of instances, reptiles, original aim. Talk less about the dry goods.

=================================== I'm a split line ===================================================

Requirements: Crawl Top N Apps for Xiaomi store

Open Xiaomi App Store bar page via browser, F12 review element

#Coding=utf-8ImportRequests
Import re fromBs4ImportBeautifullsoupdefParser_apks (Self, count=0):" "Xiaomi Application Market" "_root_url="http://app.mi.com"#应用市场主页网址Res_parser={} page_num=1 #设置爬取的页面, crawl from the first page, crawl through the second page of the first page, and so on whileCount:
#获取排行榜页面的网页内容 Wbdata= Requests.get ("http://app.mi.com/topList?page="+str (page_num)). TextPrint("Start Crawl"+str (Page_num) +"page")
#解析页面内容获取 app Download interface connection soup=beautifulsoup (Wbdata,"Html.parser") Links=soup.body.contents[3].find_all ("a", Href=re.compile ("/details?"), Class_ ="", alt="") #BeautifullSoup的具体用法请百度一下吧 ...  forLinkinchLinks:detail_link=urllib.parse.urljoin (_root_url, str (link["href"])) Package_name=detail_link.split ("=") [1]
#在下载页面中获取 apk Download Address download_page=Requests.get (detail_link). Text Soup1=beautifulsoup (Download_page,"Html.parser") Download_link=soup1.find (class_="Download")["href"] Download_url=urllib.parse.urljoin (_root_url, str (download_link))
#解析后会有重复的结果, by judging the weight belowifDownload_url not inchres_parser.values (): Res_parser[package_name]=Download_url Count=count-1ifcount==0: Break            ifCount >0:page_num=page_num+1Print("the number of crawl apk is:"+str (len (res_parser )))returnRes_parser

def Craw_apks (self, count=1, save_path= "d:\\apk\\"):        res_dic=Parser_apks (count)                for   in res_dic.keys ():            print("Downloading app:" +apk)            Urllib.request.urlretrieve (res_dic[apk],save_path+apk+ ". apk")            print("Download Complete")

if __name__= ="__main__":

    Craw_apks (10)

Operation Result:

You are downloading the application: Com.tencent.tmgp.sgame
Download complete
.
.
.

The above is the content of the simple crawler, in fact, the implementation of the crawler is still very complex, different Web pages have different analytic way, but also need to study deeply ...

Python Crawler Starter instance (crawl top app apk in Xiaomi store)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Python Crawler Starter instance (crawl top app apk in Xiaomi store)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Python Crawler Starter instance (crawl top app apk in Xiaomi store)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support