This article is an example of the Python implementation of crawling Web pages and parsing functions. The main analytical questions and answers and Baidu's homepage. Share for everyone to use for reference.
The main functional code is as follows:
#!/usr/bin/python #coding =utf-8 Import sys import re import urllib2 from urllib import urlencode from urllib import quo Te import Time maxline = Wenda = Re.compile ("href=\" http://wenda.so.com/q/.+\?src= (. +?) \ "") Baidu = Re.compile ("<a href=\" http://www.baidu.com/link\?url=.+\ ".*?> more aware of related issues .*?</a>") F1 = Open (" Baidupage.txt "," w ") F2 = Open (" Wendapage.txt "," W ") for line in sys.stdin:if maxline = = 0:break query = Line.str
IP ();
Time.sleep (1);
Recall_url = "http://www.so.com/s?&q=" + query;
Response = Urllib2.urlopen (Recall_url);
html = Response.read ();
F1.write (HTML) m = wenda.search (HTML);
If M:if m.group (1) = = "": Print Query + "\twenda\t0";
Else:print query + "\TWENDA\T1";
Else:print query + "\twenda\t0";
Recall_url = "http://www.baidu.com/s?wd=" + query + "&ie=utf-8";
Response = Urllib2.urlopen (Recall_url);
html = Response.read ();
F2.write (HTML) m = baidu.search (HTML);
If M:print query + "\TBAIDU\T1";
Else:print query + "\tbaidu\t0";
Maxline = maxline-1;
F1.close () F2.close ()
I hope this article will help you with your study of Python programming.