1. Open a webpage to retrieve all the content
From Urllib Import Urlopen
Doc = Urlopen ( Http://www.baidu.com" ) . Read ()
Print Doc
2. Get the Http Header
From Urllib Import Urlopen
Doc = Urlopen ( Http://www.baidu.com" )
Print Doc . Info ()
Print Doc . Info () . Getheader ( 'Content-type' )
3. Use proxy
1. view environment variables
Print " "N " . Join ([ " % S = % S " % ( K , V ) For K , V In OS . Environ . Items ()])
Print OS . Getenv ( "Http_proxy" )
2. Set Environment Variables
Import OS
OS . Putenv ( "Http_proxy" , "Http: // proxyaddr: <port>" )
3. Use a proxy
# Use http://www.someproxy.com: 3128 for http proxying
Proxies = { 'Http' : 'Http: // www.someproxy.com: 808080' }
Filehandle = Urllib . Urlopen ( Some_url , Proxies = Proxies )
# Don't use any proxies
Filehandle = Urllib . Urlopen ( Some_url , Proxies = {})
# Use proxies from environment-both versions are equivalent
Filehandle = Urllib . Urlopen ( Some_url , Proxies = None )
Filehandle = Urllib . Urlopen ( Some_url )