Python crawler Knowledge Point--agent

Source: Internet
Author: User

Agent Fundamentals

The proxy actually refers to the proxy server. function is the proxy network user to obtain the network information, is the network information transit point.

The normal request process is: Send the request to the server ===>web Server bar response back

After you set up the proxy server,:==> sends a request to the proxy server ==> the request to the proxy server ==> the proxy server sends the Web server ==> the response of the Web server returned by the proxy server to the native

Role

Break through your own IP access restrictions and access some sites that you don't normally have access to.

Access to some units or groups of internal resources: for example, the use of educational network address segment free proxy server, can be used to open the education network of various types of FTP download upload, as well as all kinds of data query sharing services.

Increased access speed: Usually the proxy server is set-a large hard disk buffer, when the outside information through, but also save it to the buffer, when other users access the same information, the buffer is directly removed from the information to the user, to improve access speed.

Hide Real IP: Internet users can also hide their IP in this way, from attack. For crawlers, we use proxies to hide their IP and prevent their IP from being blocked.

Agent classification According to the agent's agreement, the agent can be divided into the following categories.
    • FTP proxy server: Mainly used to access the FTP server, generally have upload, download and caching functions, the port is generally 21, 2121 and so on.

    • HTTP proxy server: Mainly used to access Web pages, generally have content filtering and caching functions, the port is generally 808080, 3128 and so on.
    • SSL/TLS proxy: Mainly used to visit the same encrypted web site, like SsL or TLS encryption (up to 128-bit encryption strength), the port is generally 443.
    • RTSP proxy: Mainly used to access the real streaming media server, generally has a cache function, port-like 554.
    • Telnet agent: Mainly used for tenei remote control hacker hacking computer often used to hide identity), the port is generally 23
    • POP3/SMTP Agent: Mainly used for POP3/SMTP way to send and receive mail, generally have a cache function, port-like 1025. Port Socks Agent: Simply pass the packet, do not care about the specific protocol and usage, so the speed is much faster, generally has the function, the port is generally 1080. Socks Proxy protocol is divided into SOCKS4 and SOCKS5, the former only support TCP and the latter support TCP and UDP, but also support various authentication mechanisms, server-side domain name resolution. In short, SOCKS4 can do SOCKS5 can do, but SOCKSS can do SOCKS4 not necessarily can do.

2. Depending on how anonymous the agent is based on the degree of anonymity, the agent can be categorized into the following categories.
    • Highly anonymous proxy: packets will be forwarded intact, it appears to the server as if it is a normal client access, and the recorded IP is the proxy IP.
    • Ordinary anonymous proxy: will make some changes on the packet, the server may find that this is a proxy servers, there is a certain chance to trace the real IP of the client. The proxy server usually adds a human HTTP header with Http_via and Http_x_forwarded_for.
    • Transparent proxy: Not only changes the packet, but also tells the server client the real IP. In addition to the ability to use caching technology to increase the browsing speed, can be used to improve the security of content filtering, and no other significant role, the most common example is the hardware firewall in the intranet.
    • Spy Agent: A proxy server that is created by an organization or individual to record data transmitted by a user and then researched, monitored, and so on.

      Common proxy settings

    • Free agent on the Internet
    • Paid Proxy Service
    • ADSL dialing

The article is excerpted from Cia Qingcai's "Python3 Network crawler Development Combat"

Python crawler Knowledge Point--agent

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.