Python crawler preparation and python crawler preparation

Source: Internet
Author: User

Python crawler preparation and python crawler preparation

1. http programming knowledge

  • Working Mode of client and server in http

Establish a reliable tcp link between the client and the server (this link is a long time in HTTP1.1, And the disconnection policy is timed out)

The client communicates with the server through a socket, sends a request, and receives the response

The http protocol is stateless, which means that each request is independent of each other and the client and server do not record the customer's behavior.

The client adds headers to the HTTP request to tell the server the content of the request in an acceptable format.

 

  • Common Request methods include get and post.

Get: the client requests a file.

Post: the client sends data for the server to process

 ClassUrllib2.Request (Url [, data] [, headers] [, origin_req_host] [, unverifiable])

URL: it should be a string

Data: A string encoded by urllib. urlencode ().

Headers: Used to spoof user_agent and disguise access from scripts as browser access.

Sample Code:

Import urllib

Import urllib2

Url = 'HTTP: // www.someserver.com/cgi-bin/register.cgi'

User_agent = 'mozilla/4.0 (compatible; MSIE 5.5; Windows NT )'

Values = {'name': 'why ',

'Location': 'sdu ',

'Language': 'python '}

Headers = {'user-agent': user_agent}

Data = urllib. urlencode (values)

Req = urllib2.Request (url, data, headers)

Response = urllib2.urlopen (req)

The_page = response. read ()

Reference blog: http://blog.csdn.net/pleasecallmewhy/article/details/8923067

3. Save the following code in html format and open it in the corresponding browser to obtain the version information of the browser.

<Html>

User_agent of sogou Browser

 

User_agent of Baidu Browser

 

User_agent of Google chorme

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.