Python crawler crawls OA Lucky airship platform to get data

Source: Internet
Author: User
Tags gettext sublime text

Installing BeautifulSoup and requests

Open the cmd window of Windows Input command PIP install requests perform the installation, wait for his installation to complete.

BeautifulSoup Library is the same way

The compiler I'm using is sublime text 3, and I think it's quite handy for a compiled software

Other tools: Chrome browser

Python version: Python3.6

Running platform: Windows

1, first we search OA Lucky airship platform ranking: "XXX. Com/h5 "Enterprise: 217 1793 408

Get the code for the Web page:

[Python] View plain copy
def gethtmltext (url,k):
Try
if (k==0):
a={}
Else
a={' offset ': k}
R = Requests.get (url,params=a,headers={' user-agent ': ' mozilla/4.0 ')
R.raise_for_status ()
r.encoding = r.apparent_encoding
Return R.text
Except
Print ("failed!")
After observation because each page's URL has different offset, so as long as the change offset=k can get the information of each page

Change the URL by using the main function:

[Python] View plain copy
def main ():
Basicurl= ' xxx. Com/h5 '
K=0
While k<=100:
Html=gethtmltext (Basicurl,k)
k+=10
GetName (HTML)
The information in the tag is obtained through the BeautifulSoup method layer, and the For loop output

[Python] View plain copy
def getname (HTML):
Soup = BeautifulSoup (html, "Html.parser")
Paihanglist=soup.find (' DL ', attrs={' class ': ' Board-wrapper '})
Mov=[]
Actor=[]
For movlist in Paihanglist.find_all (' DD '):
Movitem=movlist.find (' div ', attrs={' class ': ' Movie-item-info '})
Movname=movitem.find (' P ', attrs={' class ': ' Name '}). GetText ()
Actors=movlist.find (' div ', attrs={' class ': ' Board-item-main '})
Actorname=actors.find (' P ', attrs={' class ': ' Star '}). GetText ()
B=actorname.replace (' \ n ', ')
C=b.replace (",")
Actor.append (c)
Mov.append (Movname)
Mode= "{0:<30}\t{1:<50}"
For i,j in Zip (mov,actor):
Print (Mode.format (I,J,CHR (12288)))

Python crawler crawls OA Lucky airship platform to get data

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.