A small example of capturing HTML information on a webpage using a Python program.
There are many ways to capture web page data, such as directly sending code to http, simulating browser request data (usually requiring login verification), and controlling the browser to capture data. This example shows a small example of reading simple webpage data without considering the complexity:
Target Data
Save the hyperlinks of all these contestants on the page of the ittf website.
Data Request
I really like databases that conform to human thinking, such as requests. If you want to take Web text directly, you can do it in one sentence:
doc = requests.get(url).text
Parse html to get data
Take beautifulsoup as an example, including obtaining tags, links, and traversing Based on html hierarchies. For more information, see here. The following snippet obtains the link at the specified position on the specified page from the ittf website.
url = 'http://www.ittf.com/ittf_ranking/WR_Table_3_A2.asp?Age_category_1=&Age_category_2=&Age_category_3=&Age_category_4=&Age_category_5=&Category=100W&Cont=&Country=&Gender=W&Month1=4&Year1=2015&s_Player_Name=&Formv_WR_Table_3_Page='+str(page)doc = requests.get(url).textsoup = BeautifulSoup(doc)atags = soup.find_all('a')rank_link_pre = 'http://www.ittf.com/ittf_ranking/'mlfile = open(linkfile,'a')for atag in atags: #print atag if atag!=None and atag.get('href') != None: if "WR_Table_3_A2_Details.asp" in atag['href']: link = rank_link_pre + atag['href'] links.append(link) mlfile.write(link+'\n') print 'fetch link: '+linkmlfile.close()