The first web crawler program written in Python, python Crawler
Today, I tried to use python to write a web crawler code. I mainly wanted to visit a website, select the information I was interested in, and save the information in Excel in a certain format.
This code mainly uses the following functions of python. Because you are not familiar with python, You can paste the code below.
1. Use a url to open a website webpage
import urllib2data = urllib2.urlopen(string_full_link).read().decode('utf8')
print data
2. Use regular expressions to match
Import re # General English match reg = "a href = \ S * target = '_ blank' title = \ S *" "dicList = re. compile (reg ). findall (data) print dicList
# For Chinese Regular Expression matching, use the unicode code reg = u "\ u5730 \ u5740 \ S *" # unicode codeaddrList = re corresponding to "Address. compile (reg ). findall (sub_data)
print addrList
3. Write Data to an excel file
import xlrdimport xlwt file = xlwt.Workbook() table = file.add_sheet('hk', cell_overwrite_ok=True) print index, name, addr, tel table.write(index, 0, name) table.write(index, 1, addr) table.write(index, 2, tel) file.save("""D:\\test.xls""")