Python Crawl soccer Match schedule notes

Source: Internet
Author: User

Goal: to crawl a website race schedule, Dynamic Web page, you need to find the corresponding Ajax request (for specific reference: 53399949)

#-*-Coding:utf-8-*-import sysimport reimport urllib.requestlink = "https://***" r = urllib.request.Request (link) R.add_header (' user-agent ', ' mozilla/5.0 (Windows NT 6.1; Win64; x64) applewebkit/537.36 (khtml, like Gecko) chrome/59.0.3071.115 safari/537.36 ') HTML = Urllib.request.urlopen (R, timeout=500). Read () HTML = Bytes.decode (html,encoding= "GBK")
#返回大量json, need to extract #找出返回json中对应正则匹配的字符串js = Re.findall (' "N": "(. *?)" ', HTML) i=0 #循环打印比赛信息Try:while (1):
#将字符串Unicode转化为中文 and output print (Js[i].encode (' Utf-8 '). Decode (' Unicode_escape '), Js[i+1].encode (' Utf-8 '). Decode ( ' Unicode_escape '), "VS", Js[i+2].encode (' Utf-8 '). Decode (' Unicode_escape ')) i=i+3
#当所有赛程爬取结束时, an error "Indexerror:list index out of range" will be made, so exception handling except Indexerror:print ("finished") /c6>

Summarize the points of attention:

1. Python 3 uses this Import Urllib.request

Because Urllib and urllib2 fit in.

2, the conversion of the string Unicode to Chinese attention python3 and python2 different representations:

  python3: print string. Encode (' Utf-8 '). Decode (' Unicode_escape ')

  python2: Print string. Decode (' Unicode_escape ')

3, Re.findall ()

As for this function, the rule of his output can be referred to what I wrote earlier: http://www.cnblogs.com/4wheel/p/8497121.html

"n": "(. *?)"    This expression only outputs (. *?) This part (why, or refer to the article I wrote earlier), plus the question mark is the non-greedy mode, not the greedy mode
By the way practice explains the greedy pattern
Example

Summary: The non-greedy pattern is to match as little as possible in the case of regular expressions.

Instead, the greedy pattern is to match as many matches as possible in the case of regular expressions.

Python Crawl soccer Match schedule notes

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.