[Python network programming]gevent httpclient and Web page encoding

Source: Internet
Author: User

Before I saw geventhttpclient this project, https://github.com/gwik/geventhttpclient, the official document said very quickly, because the response used the C parsing, so I always wanted to use this thing in the project,

These two days have been tangled up this thing, say a word, more difficult to use, the package does not give the force, the biggest defects such as the following:

1. Redirection is not supported, redirection is required to write yourself, very troublesome

2. The newly created HttpClient object can only send requests for the same domain name

This is quite an egg ache, I took a little time to encapsulate a bit, conquer the above two problems, but also added their own active codec problems, code such as the following:

#!/usr/bin/env python#-*-encoding:utf-8-*-import refrom geventhttpclient.url import Urlfrom geventhttpclient.client Import httpclient,httpclientpoolfrom urlparse import urljoin#from core.common Import urljoinheaders = {' Accept ': ' Tex t/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 ', ' user-agent ': ' mozilla/5.0 (Windows NT 6.1; WOW64; rv:29.0) gecko/20100101 firefox/29.0 '}defaulf_method = "GET" max_redirect_time = 10default_page_encoding = "UTF8" class Differdomainexception (Exception): "" "If request different domain url,geventhttpclient would throw it, see gevent. Client ' Raise ValueError ("Invalid host in URL") ' "" "Def __init__ (Self,uri): Self.args = URI Self.uri = Uriclass maxredirectexception (Exception): def __init__ (self,response): Self.args = Response Self.respon SE = Responseclass HTTP (HTTPClient): def request (Self,request_uri, method=defaulf_method,body=b "", Headers={},follow_ Redirect=true,redirects=max_redirect_time): if body and method = = Defaulf_method:method = "POST" h = [K.title () for K in Headers.iterkey S ()] Headers.update (Dict ([(k,v) for k,v in Headers.iteritems () if k not in h])) response = Super (http,s ELF). Request (method, Request_uri, body, headers) if Follow_redirect and Response.status_code in (301,302,303,307) a nd Response.method in ("GET", "POST"): If redirects:location = Response.get (' location ') or RESPO Nse.get (' content-location ') or response.get (' uri ') if location:location = Urljoin (reque st_uri,location) if not Location.startswith (self._base_url_string): Raise differ Domainexception (location) return Self.request (location, method, body, headers, follow_redirect,redirect s-1) else:raise maxredirectexception (response) return Responseclass Httppool (httpclientp OOL): Def get_client (Self, URL): If not isinstance (URL, url): url = URL (url) client_key = Url.host, Url.port tr Y:return Self.clients[client_key] except keyerror:client = Http.from_url (URL, **self.clien T_args) Self.clients[client_key] = client return Client_poll = Httppool (network_timeout=100,connecti on_timeout=100) Meta_charset_regex = Re.compile (R ' (SI) 
In the test page coding problem encountered some problems, see the following:

Because the head of the request first arrived, so we generally feel that the content of the return encoding is based on the head first, assuming no longer look at the page encoding.

We look at NetEase's code, head for GBK, Web page for gb2312, but with gb2312 decoding incredibly have problems,??? I'm very puzzled, why are you guys so big?

But using the head GBK decoding is normal, which also proves that the head coding takes precedence. Supposedly the page encoding is to tell the browser to gb2312 display, but there are obvious problems, how does the browser do?


We look at Sina again, this more let me depressed, who to rescue me ah?



[Python network programming]gevent httpclient and Web page encoding

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.