Python3 web crawler Learning-using requests (1)

Source: Internet
Author: User

There are many convenient methods in the Reuqests library, such as getting a Web page in get, and in the requests library is the method get (), on the code

Import= requests.get ('https://www.baidu.com')print (Type (r)) Print (R.status_code) Print (Type (r.text)) Print (R.text) Print (r.cookies)

Equivalent to Urlopen method, get a response object, and then output his type, status code, the type of the corresponding body, content and cookies

Requests there are many other methods, such as post,put,delete,head,options, that indicate their request.

Because the most common in HTTP is a GET request, the following is used to build a GET request instance:

    • Basic example
ImportRequestsdata= {        'name':'Germey',        ' Age':' A'        }#You can also add "? name=germey&age=22" to the URL without creating a data dictionary, but it's obviously a lot of trouble.R = Requests.get ('Http://httpbin.org/get', params =data)Print(R.text)#The Web page returns the STR type, JSON format, we can use JSON method to convert a JSON-formatted string into a dictionary
Print(R.json ()) Runfile ('f:/python/exercise/pygame/untitled0.py', wdir='F:/python/exercise/pygame'){ "args": { " Age":" A", "name":"Germey" }, "Headers": { "Accept":"*/*", "accept-encoding":"gzip, deflate", "Connection":"Close", "Host":"httpbin.org", "user-agent":"python-requests/2.18.4" }, "Origin":"182.108.3.27", "URL":"http://httpbin.org/get?name=germey&age=22"}{'args': {' Age':' A','name':'Germey'},'Headers': {'Accept':'*/*','accept-encoding':'gzip, deflate','Connection':'Close','Host':'httpbin.org','user-agent':'python-requests/2.18.4'},'Origin':'182.108.3.27','URL':'http://httpbin.org/get?name=germey&age=22'}

Note, however, that parsing errors json.decoder.JSONDecodeError errors if they are not in JSON format

    • Crawl binary data

In the crawl page, we crawl a page, that is, an HTML file, if you want to crawl pictures, audio, video and other files, you need to crawl their binaries, and then decode them to get

Import= Requests.get ("https://github.com/favicon.ico")  Print(r.text)print(r.content)

We're like this when we open it.

It can be seen that there are two properties, one is the Text property, which is a Unicode type of file, which is the symbol set type, where the English alphabet is still in English, but the Chinese characters will be represented as garbled form, a content property, which is a binary form of the type, A b at the beginning indicates its file type

Add code to open the file again

With open ('facicon.ico','wb') as F:    F.write ( R.content)

Where the first parameter of the Open function is the picture name, the second argument is opened in binary mode, and then it is found that an icon named Favicon.ico is stored under the current folder

But here I have a little doubt, why is not stored as a txt text file containing these binary code, I guess because, in the read and write files, the computer is now the binary encoding conversion, the file type of the transferred to ICO, as the string is saved as TXT, Then it should contain information about what kind of files are stored in this binary file.

Sometimes, some websites may forbid us to visit, this time add user-agent to be able, as long as in requests.get (' url ', headers= ') can be

2.post Request

 import   Requestsdata  = {"  name   ":  " germy  " , Span style= "COLOR: #800000" > " age  "  :  " 22  "  }r  = requests.post ( " https://httpbin.org/post   ", Data=data)  print   (R.text)  print  (r.content) 

{
"args": {},
"Data": "",
"Files": {},
"Form": {
"Age": "22",
"Name": "Germy"
},
"Headers": {
"Accept": "*/*",
"Accept-encoding": "gzip, deflate",
"Connection": "Close",
"Content-length": "17",
"Content-type": "application/x-www-form-urlencoded",
"Host": "httpbin.org",
"User-agent": "python-requests/2.18.4"
},
"JSON": null,
"Origin": "218.64.33.30",
"url": "Https://httpbin.org/post"
}

B ' {\ n ' args ': {}, \ n "Data": "", \ n "files": {}, \ n "form": {\ n "age": "\" \ n "name": "Germy" \ n}, \ n "headers": {\ n "Ac Cept ":" */* ", \ n" accept-encoding ":" gzip, deflate ", \ n" Connection ":" Close ", \ n" content-length ":" + ", \ n" Content-type ":" application/x-www-form-urlencoded ", \ n" Host ":" httpbin.org ", \ n" user-agent ":" python-requests/2.18.4 "\ n}, \ n" JSON ": null, \ n" origin ":" 218.64.33.30 ", \ n" url ":" Https://httpbin.org/post "\n}\n"

The result can be returned, where form is the submitted data, which proves that the post request was successfully submitted

You can get the status code via Requests.status_code, as well as history,url,cookies,headers properties

Import= Requests.get ("http://www.jianshu.com"if  Else Print ("Request successfully")

Display as request successfully

Of course requests.codes have OK this condition code, we use this condition code can get corresponding status code 200, of course we can not only this one condition code, there are many

Python3 web crawler Learning-using requests (1)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.