Python crawls the download link of an APP

Last Update:2017-05-14 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This article mainly implements batch download of Android apps. Obviously, it is unscientific to use hands. So I tried to write a semi-automated script in Python. The so-called semi-automation means to capture download links in batches and paste them together in Thunder for download, so that you can download them in batches quickly. If you need it, you can check it out. Preparations

Python 2.7.11: Download python

Pycharm: Download Pycharm

Among them, python2 and python3 are currently released synchronously. here I use python2 as the environment. Pycharm is a relatively efficient Python IDE, but it is billed.

Basic idea of implementation

First, our target website: Android Market

Click [application] to go to our key page:

def geteveryapp(self,source):  everyapp = re.findall('()',source,re.S)  #everyapp2 = re.findall('(.*?
)',everyapp,re.S)  return everyapp

Simple regular expressions are used here.

Extract the download link from the li tag:

def getinfo(self,eachclass):  info = {}  str1 = str(re.search('', eachclass).group(0))  app_url = re.search('"(.*?)"', str1).group(1)  appdown_url = app_url.replace('appinfo', 'appdown')  info['app_url'] = appdown_url  print appdown_url  return info

The difficulty is to flip the page. click the Flip button below to see the following changes in the address bar:

def changepage(self,url,total_page):  now_page = int(re.search('pi=(\d)', url).group(1))  page_group = []  for i in range(now_page,total_page+1):   link = re.sub('pi=\d','pi=%s'%i,url,re.S)   page_group.append(link)  return page_group

Crawler effect

After the key points are completed, let's take a look at the final crawler effect:

#-* _ Coding: utf8-*-import requestsimport reimport sysreload (sys) sys. setdefaultencoding ("UTF-8") class spider (object): def _ init _ (self): print U' start to crawl the content 'Def getsource (self, url ): html = requests. get (url) return html. text def changepage (self, url, total_page): now_page = int (re. search ('pi = (\ d) ', url ). group (1) page_group = [] for I in range (now_page, total_page + 1): link = re. sub ('pi = \ d', 'pi = % s' % I, url, re. s) page_group.append (link) return page_group def geteveryapp (self, source): everyapp = re. findall ('() ', Source, re. s) return everyapp def getinfo (self, eachclass): info ={} str1 = str (re. search ('', eachclass ). group (0) app_url = re. search ('"(. *?) "', Str1 ). group (1) appdown_url = app_url.replace ('appinfo', 'append') info ['app _ url'] = appdown_url print appdown_url return info def saveinfo (self, classinfo ): f = open('info.txt ', 'A') str2 = "http://apk.hiapk.com" for each in classinfo: f. write (str2) f. writelines (each ['app _ url'] + '\ n') f. close () if _ name _ = '_ main _': appinfo = [] url = 'http: // apk.hiapk.com/#/mediaandvideo? Sort = 5 & pi = 1' appurl = spider () all_links = appurl. changepage (url, 5) for link in all_links: print U' processing page '+ link html = appurl. getsource (link) every_app = appurl. geteveryapp (html) for each in every_app: info = appurl. getinfo (each) appinfo. append (info) appurl. saveinfo (appinfo)

Summary

The selected target webpage has a clear and simple structure, which is a basic crawler. Please forgive me for the messy code writing. the above is all about this article. I hope it will help you in your study or work. if you have any questions, please leave a message.

For more articles about how to use Python to crawl the download link of an APP, refer to the PHP Chinese website!

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Python crawls the download link of an APP

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Python crawls the download link of an APP

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support