The principle of proxy service is very simple, take the browser and Web server. It's a browser.
Send request to B agent, B agent then send request to C Web service, then C reponse->b->a.
To write a Web Proxy service is to understand the HTTP protocol, of course, not much depth, unless you want to achieve a powerful function: Modify XX information,
Load balancing, and more. The HTTP request consists of three parts: the request line, the message header, the request body;
Detailed online has, want to know can look. Here is a normal GET request header (the cookie section I did not screenshot, using the system W7):
You can see the first line: Get is the request method,/is the path, followed by the protocol version, the second line is the request header, are the key-value pairs form;
The Get method has no body. Post has the body, in addition, the request method head basically consistent, each line at the end is \ r \ n.
The basic request method is as follows:
Get request gets the resource identified by the Request-uri
Post appends new data to the resource identified by Request-uri
HEAD request Gets the response message header for the resource identified by Request-uri
PUT Request server stores a resource and uses Request-uri as its identity
Delete Request server deletes the resource identified by the Request-uri
TRACE requests the server to echo received request information, primarily for testing or diagnostics
CONNECT reserved for future use
Options request the performance of the query server, or query for resource-related choices and requirements
However, after using the proxy, the following requests are received from the Proxy service:
Compare with the first picture, what's the difference ... The resource path for the first row is not correct. When the proxy request is set on the browser, the entire URL is used as the resource path, so we have to delete the domain name, and then the proxy server sends the modified request to the target.
Web server. It is so simple, of course, the Connect method is special, to special treatment, so first say other methods.
The basic idea:
1, the proxy server running monitoring, when there is a client browser request arrives through accept () to obtain the client handle (or called descriptor);
2. Use the client descriptor to receive the request from the browser, separating out the first behavior to modify the first line and get method,
To remove the part, remove the http://The part is expressed with targethost.
3, through the 2nd step can obtain methods, request and targethost, this step can be based on different method to do different treatment,
Because get, poet, PUT, delete and so on except connect processing basically consistent, so processing first line, for example:
Copy CodeThe code is as follows:
GET http://www.a.com/HTTP/1.1
Replaced by
get/http/1.1
At this point the targethost is the red part, the default request 80 port, at this time port is 80, if there is a port in Targethost (such as www.a.com:8081),
Take the sub-branch port, at which port is 8081. It then connects to the target server based on Targethost and port, and the implementation code is as follows:
Copy CodeThe code is as follows:
def gettargetinfo (self,host): #处理targetHost获得网址和端口, as the return value.
Port=0
Site=none
If ': ' In Host:
Tmp=host.split (': ')
SITE=TMP[0]
Port=int (Tmp[1])
Else
Site=host
Port=80
Return Site,port
def commonmethod (self,request): #处理除CONNECT以外的方法
Tmp=self.targethost.split ('/')
net=tmp[0]+ '//' +tmp[2]
Request=request.replace (NET, ") #替换掉首行不必要的部分
Targetaddr=self.gettargetinfo (tmp[2]) #调用上面的函数
Try
(FAM,_,_,_,ADDR) =socket.getaddrinfo (targetaddr[0],targetaddr[1]) [0]
Except Exception as E:
Print E
Return
Self.target=socket.socket (FAM)
Self.target.connect (addr) #连接到目标web服务
4, this step is good to do, according to the third step after processing request can be sent to the Web server Self.target.send (request).
5, this step Web server reponse echo through the proxy service directly to the client on the line, I used a non-blocking select, you can try Epoll.
The basic step is this, the method functions used can be improved, such as the main function part of the multi-threaded or multi-process, how to choose ...
But the idea is almost like this. If you want to test, Chrome installs Switchysharp plugin, set up, proxy port 8083;
Firefox plugin autoproxy.
The processing of connect is still in the solution (if you have a friend help is better), so now this agent does not support the HTTPS protocol.
Proxy service can get all the information of HTTP protocol, want to learn HTTP, take advantage of proxy server is a good way.
The following code is attached
Copy CodeThe code is as follows:
#-*-Coding:utf-8-*-
Import Socket,select
Import Sys
Import Thread
From multiprocessing import Process
Class Proxy:
def __init__ (SELF,SOC):
Self.client,_=soc.accept ()
Self.target=none
Self.request_url=none
Self. bufsize=4096
Self.method=none
Self.targethost=none
def getclientrequest (self):
Request=self.client.recv (self. BUFSIZE)
If not request:
Return None
Cn=request.find (' \ n ')
FIRSTLINE=REQUEST[:CN]
Print Firstline[:len (Firstline)-9]
Line=firstline.split ()
SELF.METHOD=LINE[0]
SELF.TARGETHOST=LINE[1]
Return request
def commonmethod (self,request):
Tmp=self.targethost.split ('/')
net=tmp[0]+ '//' +tmp[2]
Request=request.replace (NET, ")
Targetaddr=self.gettargetinfo (Tmp[2])
Try
(FAM,_,_,_,ADDR) =socket.getaddrinfo (targetaddr[0],targetaddr[1]) [0]
Except Exception as E:
Print E
Return
Self.target=socket.socket (FAM)
Self.target.connect (addr)
Self.target.send (Request)
Self.nonblocking ()
def connectmethod (self,request): #对于CONNECT处理可以添加在这里
Pass
def run (self):
Request=self.getclientrequest ()
If Request:
If Self.method in [' GET ', ' POST ', ' PUT ', ' DELETE ', ' has ']:
Self.commonmethod (Request)
Elif self.method== ' CONNECT ':
Self.connectmethod (Request)
def nonblocking (self):
Inputs=[self.client,self.target]
While True:
Readable,writeable,errs=select.select (inputs,[],inputs,3)
If errs:
Break
For Soc in readable:
Data=soc.recv (self. BUFSIZE)
If data:
If Soc is self.client:
Self.target.send (data)
Elif Soc is self.target:
Self.client.send (data)
Else
Break
Self.client.close ()
Self.target.close ()
def gettargetinfo (self,host):
Port=0
Site=none
If ': ' In Host:
Tmp=host.split (': ')
SITE=TMP[0]
Port=int (Tmp[1])
Else
Site=host
Port=80
Return Site,port
If __name__== ' __main__ ':
Host = ' 127.0.0.1 '
Port = 8083
Backlog = 5
Server = Socket.socket (socket.af_inet,socket. SOCK_STREAM)
Server.setsockopt (socket. Sol_socket,socket. so_reuseaddr,1)
Server.bind ((Host,port))
Server.listen (5)
While True:
Thread.start_new_thread (Proxy (server). Run, ())
# p=process (target=proxy (server). Run, args= ()) #多进程
# P.start ()