A small example of fetching HTML information from a Web page with a python program

Source: Internet
Author: User

This article mainly introduces the use of Python program to crawl the HTML information of a small example, the use of the method is also the basis for the use of Python to write reptiles, the need for friends can refer to the

There are a number of ideas to crawl Web data, generally: Direct code request HTTP, Analog browser request data (usually require login verification), control browser to achieve data capture. This article does not consider the complexity of the case, put a read simple Web page data Small example:

Target data

Save the hyperlinks to all of these contestants on this page of the ITTF Web site.

Data request

Really like the library of human thinking, such as requests, if you want to directly take the page text, a word to fix:

?

1 doc = requests.get (URL). text

Parse HTML to get data

Take BeautifulSoup as an example, including obtaining tags, links, and traversal based on HTML hierarchies. See here for reference. The following fragment, from the ITTF Web site, gets a link to the specified location on the specified page.

?

1 2 3 4 5 6 7 8 9 a URL = ' http://www.ittf.com/ittf_ranking/WR_Table_3_A2.asp? Age_category_1=&age_category_2=&age_category_3=&age_category_4=&age_category_5=&category = 100w&cont=&country=&gender=w&month1=4&year1=2015&s_player_name=&formv_wr_table_3_ Page= ' +str (page) doc = requests.get (URL). Text soup = BeautifulSoup (doc) atags = Soup.find_all (' a ') Rank_link_pre = ' http:/ /www.ittf.com/ittf_ranking/'   mlfile = open (Linkfile, ' a ') for Atag in atags: #print atag if Atag!=none and Atag.get ( ' href ')!= none:if "wr_table_3_a2_details.asp" in atag[' href ']: link = rank_link_pre + atag[' href '] links.append (link) ml File.write (link+ ' n ') print ' Fetch link: ' +link mlfile.close ()

        Note < : More Wonderful tutorials please focus on the triple Programming

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.