Python crawls the download link of an APP

Source: Internet
Author: User
This article mainly implements batch download of Android apps. Obviously, it is unscientific to use hands. So I tried to write a semi-automated script in Python. The so-called semi-automation means to capture download links in batches and paste them together in Thunder for download, so that you can download them in batches quickly. If you need it, you can check it out. Preparations

Python 2.7.11: Download python

Pycharm: Download Pycharm

Among them, python2 and python3 are currently released synchronously. here I use python2 as the environment. Pycharm is a relatively efficient Python IDE, but it is billed.

Basic idea of implementation

First, our target website: Android Market

Click [application] to go to our key page:

def geteveryapp(self,source):  everyapp = re.findall('(
  • )',source,re.S) #everyapp2 = re.findall('(

    .*?

    )',everyapp,re.S) return everyapp

  • Simple regular expressions are used here.

    Extract the download link from the li tag:


    def getinfo(self,eachclass):  info = {}  str1 = str(re.search('', eachclass).group(0))  app_url = re.search('"(.*?)"', str1).group(1)  appdown_url = app_url.replace('appinfo', 'appdown')  info['app_url'] = appdown_url  print appdown_url  return info


    The difficulty is to flip the page. click the Flip button below to see the following changes in the address bar:

    def changepage(self,url,total_page):  now_page = int(re.search('pi=(\d)', url).group(1))  page_group = []  for i in range(now_page,total_page+1):   link = re.sub('pi=\d','pi=%s'%i,url,re.S)   page_group.append(link)  return page_group


    Crawler effect

    After the key points are completed, let's take a look at the final crawler effect:

    #-* _ Coding: utf8-*-import requestsimport reimport sysreload (sys) sys. setdefaultencoding ("UTF-8") class spider (object): def _ init _ (self): print U' start to crawl the content 'Def getsource (self, url ): html = requests. get (url) return html. text def changepage (self, url, total_page): now_page = int (re. search ('pi = (\ d) ', url ). group (1) page_group = [] for I in range (now_page, total_page + 1): link = re. sub ('pi = \ d', 'pi = % s' % I, url, re. s) page_group.append (link) return page_group def geteveryapp (self, source): everyapp = re. findall ('(
  • ) ', Source, re. s) return everyapp def getinfo (self, eachclass): info ={} str1 = str (re. search ('', eachclass ). group (0) app_url = re. search ('"(. *?) "', Str1 ). group (1) appdown_url = app_url.replace ('appinfo', 'append') info ['app _ url'] = appdown_url print appdown_url return info def saveinfo (self, classinfo ): f = open('info.txt ', 'A') str2 = "http://apk.hiapk.com" for each in classinfo: f. write (str2) f. writelines (each ['app _ url'] + '\ n') f. close () if _ name _ = '_ main _': appinfo = [] url = 'http: // apk.hiapk.com/#/mediaandvideo? Sort = 5 & pi = 1' appurl = spider () all_links = appurl. changepage (url, 5) for link in all_links: print U' processing page '+ link html = appurl. getsource (link) every_app = appurl. geteveryapp (html) for each in every_app: info = appurl. getinfo (each) appinfo. append (info) appurl. saveinfo (appinfo)

  • Summary

    The selected target webpage has a clear and simple structure, which is a basic crawler. Please forgive me for the messy code writing. the above is all about this article. I hope it will help you in your study or work. if you have any questions, please leave a message.

    For more articles about how to use Python to crawl the download link of an APP, refer to the PHP Chinese website!

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.