Agent Fundamentals
The proxy actually refers to the proxy server. function is the proxy network user to obtain the network information, is the network information transit point.
The normal request process is: Send the request to the server ===>web Server bar response back
After you set up the proxy server,:==> sends a request to the proxy server ==> the request to the proxy server ==> the proxy server sends the Web server ==> the response of the Web server returned by the proxy server to the native
Role
Break through your own IP access restrictions and access some sites that you don't normally have access to.
Access to some units or groups of internal resources: for example, the use of educational network address segment free proxy server, can be used to open the education network of various types of FTP download upload, as well as all kinds of data query sharing services.
Increased access speed: Usually the proxy server is set-a large hard disk buffer, when the outside information through, but also save it to the buffer, when other users access the same information, the buffer is directly removed from the information to the user, to improve access speed.
Hide Real IP: Internet users can also hide their IP in this way, from attack. For crawlers, we use proxies to hide their IP and prevent their IP from being blocked.
Agent classification According to the agent's agreement, the agent can be divided into the following categories.
FTP proxy server: Mainly used to access the FTP server, generally have upload, download and caching functions, the port is generally 21, 2121 and so on.
- HTTP proxy server: Mainly used to access Web pages, generally have content filtering and caching functions, the port is generally 808080, 3128 and so on.
- SSL/TLS proxy: Mainly used to visit the same encrypted web site, like SsL or TLS encryption (up to 128-bit encryption strength), the port is generally 443.
- RTSP proxy: Mainly used to access the real streaming media server, generally has a cache function, port-like 554.
- Telnet agent: Mainly used for tenei remote control hacker hacking computer often used to hide identity), the port is generally 23
POP3/SMTP Agent: Mainly used for POP3/SMTP way to send and receive mail, generally have a cache function, port-like 1025. Port Socks Agent: Simply pass the packet, do not care about the specific protocol and usage, so the speed is much faster, generally has the function, the port is generally 1080. Socks Proxy protocol is divided into SOCKS4 and SOCKS5, the former only support TCP and the latter support TCP and UDP, but also support various authentication mechanisms, server-side domain name resolution. In short, SOCKS4 can do SOCKS5 can do, but SOCKSS can do SOCKS4 not necessarily can do.
2. Depending on how anonymous the agent is based on the degree of anonymity, the agent can be categorized into the following categories.
The article is excerpted from Cia Qingcai's "Python3 Network crawler Development Combat"
Python crawler Knowledge Point--agent