Python Question 4: crawling movies, python question 4
Import re # regular expression, used to extract data import requests # download webpage source code ''' install requests module: pip install requests Reference document: https://www.cnblogs.com/jamespan23/p/5526311.html'''for m in range ): url = 'HTTP: // www.dytt8.net/html/gndy/dyzz/list_23_'{str (m={'.html 'html = requests. get (url) # use static webpage html. encoding = 'gb2312' # specify the encoding format. You can view the source code of the webpage to learn data = re. findall ('<a href = "(. *?) "Class =" ulink "> ', html. text) # extract information and return to the list #(.*?) Match any information starting with '<a href =' and ending with 'class = "ulink" 'for n in data: url2 = 'HTTP: // www.dytt8.net '+ n html2 = requests. get (url2) html2.encoding = 'gb2312' ftp = re. findall ('<a href = "(. *?) "> .*? </A> </td> ', html2.text) try: with open (r 'f: \ python \ mov.txt', 'A', encoding = 'utf-8 ') as f: # UTF-8 may be compatible. If it is not compatible, use gb2312 f. write (ftp [0] + '\ n') # ftp extracts all lists and the list cannot be written to files. Therefore, add [0] writable t: print ('This page cannot be downloaded ')