This article mainly implements batch download of Android apps. Obviously, it is unscientific to use hands. So I tried to write a semi-automated script in Python. The so-called semi-automation means to capture download links in batches and paste them together in Thunder for download, so that you can download them in batches quickly. If you need it, you can check it out.
Preparations
Python 2.7.11: Download python
Pycharm: Download Pycharm
Among them, python2 and python3 are currently released synchronously. here I use python2 as the environment. Pycharm is a relatively efficient Python IDE, but it is billed.
Basic idea of implementation
First, our target website: Android Market
Click [application] to go to our key page:
def geteveryapp(self,source): everyapp = re.findall('(
)',source,re.S) #everyapp2 = re.findall('(.*?
)',everyapp,re.S) return everyapp
Simple regular expressions are used here.
Extract the download link from the li tag:
def getinfo(self,eachclass): info = {} str1 = str(re.search('', eachclass).group(0)) app_url = re.search('"(.*?)"', str1).group(1) appdown_url = app_url.replace('appinfo', 'appdown') info['app_url'] = appdown_url print appdown_url return info
The difficulty is to flip the page. click the Flip button below to see the following changes in the address bar:
def changepage(self,url,total_page): now_page = int(re.search('pi=(\d)', url).group(1)) page_group = [] for i in range(now_page,total_page+1): link = re.sub('pi=\d','pi=%s'%i,url,re.S) page_group.append(link) return page_group
Crawler effect
After the key points are completed, let's take a look at the final crawler effect:
#-* _ Coding: utf8-*-import requestsimport reimport sysreload (sys) sys. setdefaultencoding ("UTF-8") class spider (object): def _ init _ (self): print U' start to crawl the content 'Def getsource (self, url ): html = requests. get (url) return html. text def changepage (self, url, total_page): now_page = int (re. search ('pi = (\ d) ', url ). group (1) page_group = [] for I in range (now_page, total_page + 1): link = re. sub ('pi = \ d', 'pi = % s' % I, url, re. s) page_group.append (link) return page_group def geteveryapp (self, source): everyapp = re. findall ('(
) ', Source, re. s) return everyapp def getinfo (self, eachclass): info ={} str1 = str (re. search ('', eachclass ). group (0) app_url = re. search ('"(. *?) "', Str1 ). group (1) appdown_url = app_url.replace ('appinfo', 'append') info ['app _ url'] = appdown_url print appdown_url return info def saveinfo (self, classinfo ): f = open('info.txt ', 'A') str2 = "http://apk.hiapk.com" for each in classinfo: f. write (str2) f. writelines (each ['app _ url'] + '\ n') f. close () if _ name _ = '_ main _': appinfo = [] url = 'http: // apk.hiapk.com/#/mediaandvideo? Sort = 5 & pi = 1' appurl = spider () all_links = appurl. changepage (url, 5) for link in all_links: print U' processing page '+ link html = appurl. getsource (link) every_app = appurl. geteveryapp (html) for each in every_app: info = appurl. getinfo (each) appinfo. append (info) appurl. saveinfo (appinfo)
Summary
The selected target webpage has a clear and simple structure, which is a basic crawler. Please forgive me for the messy code writing. the above is all about this article. I hope it will help you in your study or work. if you have any questions, please leave a message.
For more articles about how to use Python to crawl the download link of an APP, refer to the PHP Chinese website!