Send HTTP (s) Request method using Python's socket

Source: Internet
Author: User
Tags webp
This article is mainly about the use of Python socket to send HTTP (s) Request for information, the text through the sample code introduced in very detailed, for everyone to learn or use Python has a certain reference learning value, the need for friends below to see together

Objective

This is a computer network class set up when the problem encountered, card me a day, so summarize.

In fact, before the useful requests wrote Python crawler, but the computer network requires a lower level of implementation, just I saw [this article]1 the results found that he is using a socket to achieve the request, so learn.

It should not be difficult, after all, is to establish a TCP connection.

Examples of the original website are as follows:

def fetch (URL): sock = Socket.socket () # Build Socket Sock.connect ((' xkcd.com ', 80) # Remote Connection request = ' GET {} http/1.0\r\nhost : xkcd.com\r\n\r\n '. Format (URL) # Build request Sock.send (Request.encode (' ASCII ')) # Send data to the socket response = B '  chunk = SOCK.R ECV (4096) # receives data from socket while chunk:response + = chunk Chunk = SOCK.RECV (4096) # Page is now downloaded. Links = parse_links (response) Q.add (links)

I choose to crawl the site is a chain home, of course, also see a lot of other examples, but also with fiddler grab bag, put headers whole put up, first of all refer to this article: https://segmentfault.com/a/1190000005126160, This article describes:

Python sends HTTP requests through the socket

We use the socket to send HTTP requests by visiting the Baidu home page as a case.

Import Sockets=socket.socket (Socket.af_inet,socket. Sock_stream) S.connect ((' www.baidu.com ',) s.send ("GET https://www.baidu.com/HTTP/1.1Host: www.baidu.comconnection:keep-aliveaccept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*; q=0.8upgrade-insecure-requests:1user-agent:mozilla/5.0 (Windows NT 6.1; WOW64) applewebkit/537.36 (khtml, like Gecko) chrome/45.0.2454.101 safari/537.36accept-language:zh-cn,zh;q=0.8 ") BUF=S.RECV (1024x768) while Len (BUF):p rint bufbuf = S.RECV (1024)

The socket-based HTTP programming is better for the controllability of the request parameters, but the difficulty is correspondingly greater. The data sent above is copied directly from the Fiddler clutch.

Based on the above, write the following code:

S=socket.socket (Socket.af_inet,socket. Sock_stream) S.connect ((' www.baidu.com ',) s.send (' Get/http/1.1host:zh.lianjia.comconnection: keep-alivecache-control:max-age=0upgrade-insecure-requests:1user-agent:mozilla/5.0 (Windows NT 10.0; Win64; x64) applewebkit/537.36 (khtml, like Gecko) chrome/66.0.3359.139 Safari/537.36accept:text/html,application/xhtml+xml , application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8referer:https://www.baidu.com/link?url= 4j5kx--gldlfesjhkfrepu8ac_0agntcotb-b3kfnx8vndz_6tpqoyjgkvxktczg&ck=6140.3.83.296.315.287.208.155&shh= Www.baidu.com&sht=94886267_hao_pg&wd=&eqid=af98b98700060b77000000065aef0524Accept-Encoding:gzip, Deflate, braccept-language:zh-cn,zh;q=0.9,en-ca;q=0.8,en;q=0.7cookie:lianjia_uuid= ce61c41c-25b0-46d6-a0a0-d57a75ee8706; um_distinctid=1631f588055f9-0286722badd3ec-b34356b-1fa400-1631f58805657f; _ga=ga1.2.43397143.1525239286; _smt_uid=5ae94e02.558be516; _jzqx=1.1525248800.1525335927.1.jzqsr=zh%2elianjia%2ecom|jzqct=/ershoufang/xiangzhouqu/.-; _jzqc=1; _jzqckmp=1; _gid=ga1.2.1028411676.1525594529; select_city=440400; all-lj=c60bf575348a3bc08fb27ee73be8c666; _qzjc=1; cnzzdata1254525948=963210960-1525238218-https%253a%252f%252fwww.lianjia.com%252f%7c1525608956; cnzzdata1255633284=1054798284-1525238580-https%253a%252f%252fwww.lianjia.com%252f%7c1525608969; LIANJIA_SSID=C046DDB3-3E66-4809-998A-52ADE335FDFC; _qzja=1.1070225156.1525239298260.1525603274282.1525613866775.1525609113492.1525613866775.0.0.0.92.9; _qzjto=29.3.0; _jzqa=1.3750161754444366000.1525239284.1525603274.1525613867.9; _jzqy=1.1525239284.1525613867.3.jzqsr=baidu.jzqsr=baidu; hm_lvt_9152f8221cb6243a53c83b956842be8a=1525607433,1525607626,1525609113,1525613867; hm_lpvt_9152f8221cb6243a53c83b956842be8a=1525613867; _qzjb=1.1525613866775.1.0.0.0; _jzqb=1.1.10.1525613867.1; cnzzdata1255604082=964175865-1525237915-https%253a%252f%252fwww.lianjia.com%252f%7c1525612833 ")

The result is always reported 400(Bad Request) that this place has been stuck for a long time, the final solution is a one-piece send, each after adding \ r \ n.

 Sock = Socket.socket () sock.connect ((' zh.lianjia.com ', ()) sock.send (' get/ershoufang/http/1.1\r\n '. Encode ()) Sock.send (' host:zh.lianjia.com\r\n '. Encode ()) sock.send (' connection:keep-alive\r\n '. Encode ()) Sock.send (' Cache-control:no-cache\r\n '. Encode ()) sock.send (' accept:text/html,application/xhtml+xml,application/xml;q=0.9, Image/webp,image/apng,*/*;q=0.8\r\n '. Encode ()) sock.send (' upgrade-insecure-requests:1\r\n '. Encode ()) Sock.send (' user-agent:mozilla/5.0 (Windows NT 10.0; Win64; x64) applewebkit/537.36 (khtml, like Gecko) chrome/66.0.3359.139 safari/537.36\r\n '. Encode ()) Sock.send (' Accept-encoding:gzip, deflate, br\r\n '. Encode ()) sock.send (' cookie:lianjia_uuid= ce61c41c-25b0-46d6-a0a0-d57a75ee8706; um_distinctid=1631f588055f9-0286722badd3ec-b34356b-1fa400-1631f58805657f; _ga=ga1.2.43397143.1525239286; _smt_uid=5ae94e02.558be516; _jzqx=1.1525248800.1525335927.1.jzqsr=zh%2elianjia%2ecom|jzqct=/ershoufang/xiangzhouqu/.-; _jzqc=1; _jzqy=1.1525239284.1525594526.2.jzqsr=baidu.jzqsR=baidu|jzqct=%e9%93%be%e5%ae%b6; _jzqckmp=1; _gid=ga1.2.1028411676.1525594529; hm_lvt_9152f8221cb6243a53c83b956842be8a=1525594526,1525594536,1525594804,1525595210; select_city=440400; all-lj=c60bf575348a3bc08fb27ee73be8c666; _qzjc=1; LIANJIA_SSID=99306D63-8EE5-A53C-A740-2D3021F3DB2F; cnzzdata1255604082=964175865-1525237915-https%253a%252f%252fwww.lianjia.com%252f%7c1525602095; _jzqa=1.3750161754444366000.1525239284.1525594526.1525603274.8; cnzzdata1254525948=963210960-1525238218-https%253a%252f%252fwww.lianjia.com%252f%7c1525603556; cnzzdata1255633284=1054798284-1525238580-https%253a%252f%252fwww.lianjia.com%252f%7c1525603557; hm_lpvt_9152f8221cb6243a53c83b956842be8a=1525606057; _jzqb=1.9.10.1525603274.1; _qzja=1.1070225156.1525239298260.1525597069547.1525603274282.1525605398368.1525606071025.0.0.0.86.8; _qzjb=1.1525603274282.9.0.0.0; _qzjto=23.2.0\r\n\r\n '. Encode ())

Results are always redirected, status code 301! Looking for a long time do not know what the reason, and directly in the browser URL bar to enter the URL, with fiddler grab Bag also did not catch the status of 301 package. Finally using Fiddler's composer input Http://zh.lianjia.com/ershoufang caught 301 and 200, of which 200 of the address is Https://zh.lianjia.com/ershoufang, As shown in.

This knows the reason, is the difference between HTTP and HTTPS. (In fact, the 301 status code when the answer part of the location can be observed, but a s too inconspicuous so I did not notice, causing the card for a long time)

Next, just know how to send an HTTPS request. The following is the code, mainly the change of the socket and the connection section. Note The port number is 443. Refer to the article here

Sock = Ssl.wrap_socket (Socket.socket ()) Sock.connect ((' zh.lianjia.com ', 443))

Feel that many places do not know enough in depth, the school also did not talk about the application layer. The time to study the study, if there are errors and omissions welcome.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.