Tag:logs write request designation https documentation port reference pip
Import re # Regular expression for extracting data import requests # download Web page source code ' install requests module: PIP install requests Reference Document: Https://www.cnblogs.com /jamespan23/p/5526311.html ' for m in range (1,5): url = ' http://www.dytt8.net/html/gndy/dyzz/list_23_ ' +str (m) + ' . html ' html = requests.get (URL) # use static page html.encoding = ' GB2312 ' # to specify the encoding format by viewing the Web page source code to know the data = Re.findall (' <a href= ' (. *?) "class=" Ulink ">", Html.text) # Extract information, return list # (. *?) Match any information that begins with ' <a href= ' and ends with ' class= ' Ulink ' ' for n ' data: url2 = ' http://www.dytt8.net ' + n html2 = Requests.get (url2) html2.encoding = ' GB2312 ' ftp = Re.findall (' <a href= ' (. *?) " >.*?</a></td> ', Html2.text) try: With open (R ' F:\python\mov.txt ', ' a ', encoding= ' UTF-8 ') as F: # Utf-8 is possible compatible, incompatible words using gb2312 f.write (ftp[0]+ ' \ n ') # FTP Extract is a list, the list cannot write to the file, so add [0] except: Print (' This page cannot be downloaded ')
Python topic 4: Crawling movies