Python3 web crawler Learning-using requests (1)

Last Update:2018-08-21 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

There are many convenient methods in the Reuqests library, such as getting a Web page in get, and in the requests library is the method get (), on the code

Import= requests.get ('https://www.baidu.com')print (Type (r)) Print (R.status_code) Print (Type (r.text)) Print (R.text) Print (r.cookies)

Equivalent to Urlopen method, get a response object, and then output his type, status code, the type of the corresponding body, content and cookies

Requests there are many other methods, such as post,put,delete,head,options, that indicate their request.

Because the most common in HTTP is a GET request, the following is used to build a GET request instance:

Basic example

ImportRequestsdata= {        'name':'Germey',        ' Age':' A'        }#You can also add "? name=germey&age=22" to the URL without creating a data dictionary, but it's obviously a lot of trouble.R = Requests.get ('Http://httpbin.org/get', params =data)Print(R.text)#The Web page returns the STR type, JSON format, we can use JSON method to convert a JSON-formatted string into a dictionary
Print(R.json ()) Runfile ('f:/python/exercise/pygame/untitled0.py', wdir='F:/python/exercise/pygame'){  "args": {    " Age":" A",     "name":"Germey"  },   "Headers": {    "Accept":"*/*",     "accept-encoding":"gzip, deflate",     "Connection":"Close",     "Host":"httpbin.org",     "user-agent":"python-requests/2.18.4"  },   "Origin":"182.108.3.27",   "URL":"http://httpbin.org/get?name=germey&age=22"}{'args': {' Age':' A','name':'Germey'},'Headers': {'Accept':'*/*','accept-encoding':'gzip, deflate','Connection':'Close','Host':'httpbin.org','user-agent':'python-requests/2.18.4'},'Origin':'182.108.3.27','URL':'http://httpbin.org/get?name=germey&age=22'}

Note, however, that parsing errors json.decoder.JSONDecodeError errors if they are not in JSON format

Crawl binary data

In the crawl page, we crawl a page, that is, an HTML file, if you want to crawl pictures, audio, video and other files, you need to crawl their binaries, and then decode them to get

Import= Requests.get ("https://github.com/favicon.ico")  Print(r.text)print(r.content)

We're like this when we open it.

It can be seen that there are two properties, one is the Text property, which is a Unicode type of file, which is the symbol set type, where the English alphabet is still in English, but the Chinese characters will be represented as garbled form, a content property, which is a binary form of the type, A b at the beginning indicates its file type

Add code to open the file again

With open ('facicon.ico','wb') as F:    F.write ( R.content)

Where the first parameter of the Open function is the picture name, the second argument is opened in binary mode, and then it is found that an icon named Favicon.ico is stored under the current folder

But here I have a little doubt, why is not stored as a txt text file containing these binary code, I guess because, in the read and write files, the computer is now the binary encoding conversion, the file type of the transferred to ICO, as the string is saved as TXT, Then it should contain information about what kind of files are stored in this binary file.

Sometimes, some websites may forbid us to visit, this time add user-agent to be able, as long as in requests.get (' url ', headers= ') can be

2.post Request

 import   Requestsdata  = {"  name   ":  " germy  " , Span style= "COLOR: #800000" > " age  "  :  " 22  "  }r  = requests.post ( " https://httpbin.org/post   ", Data=data)  print   (R.text)  print  (r.content)

{
"args": {},
"Data": "",
"Files": {},
"Form": {
"Age": "22",
"Name": "Germy"
},
"Headers": {
"Accept": "*/*",
"Accept-encoding": "gzip, deflate",
"Connection": "Close",
"Content-length": "17",
"Content-type": "application/x-www-form-urlencoded",
"Host": "httpbin.org",
"User-agent": "python-requests/2.18.4"
},
"JSON": null,
"Origin": "218.64.33.30",
"url": "Https://httpbin.org/post"
}

B ' {\ n ' args ': {}, \ n "Data": "", \ n "files": {}, \ n "form": {\ n "age": "\" \ n "name": "Germy" \ n}, \ n "headers": {\ n "Ac Cept ":" */* ", \ n" accept-encoding ":" gzip, deflate ", \ n" Connection ":" Close ", \ n" content-length ":" + ", \ n" Content-type ":" application/x-www-form-urlencoded ", \ n" Host ":" httpbin.org ", \ n" user-agent ":" python-requests/2.18.4 "\ n}, \ n" JSON ": null, \ n" origin ":" 218.64.33.30 ", \ n" url ":" Https://httpbin.org/post "\n}\n"

The result can be returned, where form is the submitted data, which proves that the post request was successfully submitted

You can get the status code via Requests.status_code, as well as history,url,cookies,headers properties

Import= Requests.get ("http://www.jianshu.com"if  Else Print ("Request successfully")

Display as request successfully

Of course requests.codes have OK this condition code, we use this condition code can get corresponding status code 200, of course we can not only this one condition code, there are many

Python3 web crawler Learning-using requests (1)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Python3 web crawler Learning-using requests (1)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Python3 web crawler Learning-using requests (1)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support