Installing BeautifulSoup and requests
Open the cmd window of Windows Input command PIP install requests perform the installation, wait for his installation to complete.
BeautifulSoup Library is the same way
The compiler I'm using is sublime text 3, and I think it's quite handy for a compiled software
Other tools: Chrome browser
Python version: Python3.6
Running platform: Windows
1, first we search OA Lucky airship platform ranking: "XXX. Com/h5 "Enterprise: 217 1793 408
Get the code for the Web page:
[Python] View plain copy
def gethtmltext (url,k):
Try
if (k==0):
a={}
Else
a={' offset ': k}
R = Requests.get (url,params=a,headers={' user-agent ': ' mozilla/4.0 ')
R.raise_for_status ()
r.encoding = r.apparent_encoding
Return R.text
Except
Print ("failed!")
After observation because each page's URL has different offset, so as long as the change offset=k can get the information of each page
Change the URL by using the main function:
[Python] View plain copy
def main ():
Basicurl= ' xxx. Com/h5 '
K=0
While k<=100:
Html=gethtmltext (Basicurl,k)
k+=10
GetName (HTML)
The information in the tag is obtained through the BeautifulSoup method layer, and the For loop output
[Python] View plain copy
def getname (HTML):
Soup = BeautifulSoup (html, "Html.parser")
Paihanglist=soup.find (' DL ', attrs={' class ': ' Board-wrapper '})
Mov=[]
Actor=[]
For movlist in Paihanglist.find_all (' DD '):
Movitem=movlist.find (' div ', attrs={' class ': ' Movie-item-info '})
Movname=movitem.find (' P ', attrs={' class ': ' Name '}). GetText ()
Actors=movlist.find (' div ', attrs={' class ': ' Board-item-main '})
Actorname=actors.find (' P ', attrs={' class ': ' Star '}). GetText ()
B=actorname.replace (' \ n ', ')
C=b.replace (",")
Actor.append (c)
Mov.append (Movname)
Mode= "{0:<30}\t{1:<50}"
For i,j in Zip (mov,actor):
Print (Mode.format (I,J,CHR (12288)))
Python crawler crawls OA Lucky airship platform to get data