Python Web programming-web client Programming

Last Update:2014-11-05 Source: Internet

Author: User

Tags in domain

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Web Apps also follow the client server architecture

The browser is a basic Web client, she implements two basic functions, one is to download the file from the Web server, and the other is to render the file

Modules such as Urllib and URLLIB2 (which can open web pages that need to be logged on), with similar functionality to browsers for simple Web clients

There are also loads of Web clients that not only download Web files, but also perform other complex tasks, a typical example of which is the crawler

Python implementation crawlers also have some framework modules: such as Scrapy

Create a simple Web client using PythonYou have to figure out that the browser is just one of the Web clients, and the functionality is limited, any web-based applications are Web clients such as curl and Python urllib why is urllib rather than httplib? Read down What is a URL??? Composition is very importantURLs are used to locate a document on the web, or to invoke a CGI program to generate a document for your client. CGI-generated documents are like some web frameworks, especially Python's Web client is actually File Transfer, the most direct way is to use the URL directly to locate and obtain files, in fact, most of the clients are relying on this so should first learn about the URL compositionHttp://zh.wikipedia.org/zh/%E7%BB%9F%E4%B8%80%E8%B5%84%E6%BA%90%E5%AE%9A%E4%BD%8D%E7%AC%A6 Python URL module description: Urllib and UrlparsePython supports two different modules, each of which handles URLs in different functions and compatibility. One is Urlparse, one is urllib. Where urlparse is used for URL parsing and compositing. Use it you can also learn the composition of the URL Oh, about its usage you can help. Urllib is a high-level module, Urllib module provides all the features you need unless you plan to write a lower-tier network client. Urllib provides an advanced Web Exchange library that supports Web protocols, HTTP, FTP, and Gopher protocols, while also supporting access to local files. The special function of the Urllib module is to download data from the above protocol (from Internet, LAN, host). Use this module to avoid using Httplib, Ftplib and gopherlib these modules, unless you want to use the lower function urllib The main function is to download the file from the URL, want to understand the function of this module can start from the following several functions Urlopen () Urllib.urlretrieve () Urllib.quote () and Urllib.quote_plus () Urllib.unquote () and Urllib.unquote_plus () Urllib.urlencode () Urllib2If you plan to access more complex URLs or want to handle more complex situations such as digital-based authorization, relocation, Coockie, etc., we recommend that you use the URLLIB2 module, which is especially useful for logging in to fetch data. Advanced Web ClientBrowser implementation is actually a simple Web client, the basic Web client download files from the server, Urllib and URLLIB2 and the modules described above is to implement similar functions so advanced Web Client is not just download so simple advanced web An example of a client is a web crawler (aka Spider and Robot). These programs can explore and download pages on the Internet for different purposes, including:

Index large search engines such as Google and Yahoo!
Offline Browsing-Download the document locally, reset the hyperlink, and create a mirror for your local browser. (This requirement is usually said to download the entire online help document)
Download and save a history or frame
Caching of Web pages saves the time to download the Web site again.

Here's a crawler implementation.

1 #!/usr/bin/env python2 3  fromSysImportargv4  fromOsImportmakedirs, unlink, Sep5  fromOs.pathImportIsdir, exists, dirname, Splitext6  fromStringImportReplace, find, lower7  fromHtmllibImportHtmlparser8  fromUrllibImportUrlretrieve9  fromUrlparseImportUrlparse, UrljoinTen  fromFormatterImportDumbwriter, Abstractformatter One  fromCstringioImportStringio A  - classRetriever (object):#Download Web pages -  the     def __init__(self, url): -Self.url =URL -Self.file =self.filename (URL) -  +     deffilename (self, url, deffile='index.htm'): -Parsedurl = Urlparse (URL,'http:', 0)#Parse Path +Path = parsedurl[1] + parsedurl[2] Aext =splitext (path) at         ifEXT[1] = ="': -             ifPATH[-1] = ='/': -Path + =Deffile -             Else: -Path + ='/'+Deffile -Ldir = dirname (path)#Local Directory in     ifSep! ='/':#os-indep. Path separator -Ldir = replace (Ldir,',', Sep) to         if  notIsdir (Ldir):#Create archive dir if nec. +             ifexists (Ldir): Unlink (Ldir) - makedirs (Ldir) the         returnPath *  $     defDownload (self):#Download Web pagePanax Notoginseng         Try: -retval =Urllib.urlretrieve (Self.url, Self.file) the         exceptIOError: +retval = ('* * * error:invalid URL '%s ''%  A Self.url,) the         returnretval +  -     defParseandgetlinks (self):#Pars HTML, save links $Self.parser =Htmlparser (Abstractformatter ( $ Dumbwriter (Stringio () ))) - self.parser.feed (Open (self.file). Read ()) - self.parser.close () the         returnself.parse.anchorlist - Wuyi classCrawler (object):#manage entire crawling process the  -Count = 0#static downloaded page counter Wu  -     def __init__(self, url): AboutSELF.Q =[url] $Self.seen = [] -self.dom = urlparse (URL) [1] -  -     defgetpage (self, url): AR =Retriever (URL) +retval =r.download () the         ifRetval[0] = ='*':#error situation, do not parse -             Printretval'... skipping parse' $             return theCrawler.count = Crawler.count + 1 the         Print '\ n (', Crawler.count,')' the         Print 'URL:', the URL the         Print 'FILE:', Retval[0] - self.seen.append (URL) in  theLinks = r.parseandgetlinks ()#Get and process links the          forEachlinkinchLinks: About             ifEACHLINK[:4]! ='http'  and  theFind (Eachlink,'://') = =-1: theEachlink =urljoin (URL, eachlink) the             Print '* ', Eachlink, +  -             ifFind (Lower (Eachlink),'mailto:')! =-1: the                 Print '... discarded, mailto link'Bayi                 Continue the  the             ifEachlink not inchSelf.seen: -                 ifFind (Eachlink, self.dom) = =-1: -                     Print '... discarded, not in domain' the                 Else: the                     ifEachlink not inchSELF.Q: the self.q.append (Eachlink) the                         Print '... new, added to Q' -                     Else: the                         Print '... discarded, already in Q' the             Else: the                     Print '... discarded, already processed'94  the     defGo (self):#process links in queue the          whileSELF.Q: theURL =Self.q.pop ()98 self.getpage (URL) About  - defMain ():101     ifLen (argv) > 1:102URL = argv[1]103     Else:104         Try: theurl = raw_input ('Enter starting URL:')106         except(Keyboardinterrupt, eoferror):107URL ="'108 109     if  notUrl:return theRobot =Crawler (URL)111 Robot.go () the 113 if __name__=='__main__': theMain ()

View Code

In fact, there are also some reptile libraries, not much introduction

Python Web programming-web client Programming

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More