Urllib use the following URL:
http://blog.51cto.com/shangdc/2090763
The use of Python crawler is actually convenient, it will have a variety of tools for you to use, very convenient. Java is not OK? Also, with the HttpClient tool, and a webmagic framework written by the great God, these can be crawlers, except Python's integrated library, which uses a few rows of crawls, and Java needs to write more rows to implement, but the purpose is the same.
The following is a brief introduction to the requests library:
#!/usr/local/env python# coding:utf-8import requests# below to begin to introduce the use of requests, the environment language is PYTHON3, use the following URL as a reference #http:// Www.sse.com.cn/market/bonddata/data/tb/request_param = {' Jsoncallback ': ' jsonpCallback6588 ', ' ispagination ': ' t Rue ', ' sqlid ': ' common_bond_xxpl_zqxx_l ', ' bondtype ': ' The Land of XXX House Bond ', ' pagehelp.pagesize ': ' 25 ', ' Pagehelp.pageno ': ' 2 ', ' pagehelp.beginpage ': ' 2 ', ' pagehelp.cachesize ': ' 1 ', ' Pagehelp.endpage ': '}user_agent = ' mozilla/5.0 (Windows NT 10.0; Win64; x64) applewebkit/537.36 (khtml, like Gecko) chrome/65.0.3325.146 safari/537.36 ' referer = ' http://www.sse.com.cn/market /bonddata/data/ltb/' #设置headersheaders = {' user-agent ': user_agent, ' Referer ': Referer} #设置代理proxy = {"http": "HTTP://11 3.214.13.1:8000 "}# need to request URL address Request_url = ' http://query.sse.com.cn/commonQuery.do? ' #设置请求地址response = Requests.get (Request_url, Headers=headers, Proxies=proxy, Params=request_param);p rint ( Response.status_code) #文本Response content Print (Response.text) #json格式响应内容print (Response.json ()) #二进制响应内容print (response.content) #原始格式print ( Response.raw)
Leave the Exercises:
1. How to use cookies to crawl the corresponding website?
"Python3~ Crawler Tool" uses requests library