Use the Python standard library to modify search engine results (1)

Source: Internet
Author: User

The Python standard library needs to be learned continuously for a long time. Next we will look at how we can better master the relevant technical information. I hope this will be helpful for your future use and learning. The following describes how to use it.

If the keyword I entered is passed to a program as the address parameter, the program will return a page with the top logo and search UI)/result/bottom copyright information ), what we need to get is the intermediate result section. We can use the urlopen method in the urllib of the Python standard library to obtain the strings of the entire page, and then parse these strings, there is a way to extract the intermediate result part, extract the string, and add your own header, top, and bottom, so that the prototype of searching for thieves is probably complete, next, write a test code.

 
 
  1. [code]   
  2. # Search Thief   
  3. # creator: Singo   
  4. # date: 2007-8-24   
  5. import urllib   
  6. import re   
  7. class SearchThief:   
  8. " " "the google thief " " "   
  9. global path,targetURL   
  10. path = "pages\\ "   
  11. # targetURL = "http://www.google.cn/search?complete=1&hl=zh-CN&q= "   
  12. targetURL = "http://www.baidu.com/s?wd= "   
  13. def __init__(self,key):   
  14. self.key = key   
  15. def getPage(self):   
  16. webStr = urllib.urlopen(targetURL+self.key).read() # get the page string form the url   
  17. self.setPageToFile(webStr)   
  18. def setPageToFile(self,webStr):   
  19. rereSetStr = re.compile( "\r ")   
  20. self.key = reSetStr.sub( " ",self.key) # replace the string "\r "   
  21. targetFile = file(path+self.key+ ".html ", "w ") # open the file for "w "rite   
  22. targetFile.write(webStr)   
  23. targetFile.close()   
  24. print "done "   
  25. inputKey = raw_input( "Enter you want to search --> ")   
  26. obj = SearchThief(inputKey)   
  27. obj.getPage()   
  28. [/code]  

Here, users are only required to enter a keyword, submit a request to the search engine, and save the returned page to a directory. This is just a test example. If you want to make a real search thief, you can add the extracted strings to the pre-designed template without saving this page. The extracted strings are directly displayed on the client in the form of web, in this way, you can steal the results of some search engines and construct new page rendering.

Let's take a look at the source code of the baidu search result page. There is a <DIV id = Div> </DIV> label in front of the table tag in the search structure, based on this label, we can get the result set of moving down two rows, So we add a method.

 
 
  1. getResultStr()   
  2. [code]   
  3. def getResultStr(self,webStr):   
  4. webStrwebStrList = webStr.read().split( "\r\n ")   
  5. line = webStrList.index( " <DIV id=Div> </DIV> ")+2 # get the line from " <DIV id=Div> </DIV> " move 2 line   
  6. resultStr = webStrList[line]   
  7. return resultStr   
  8. [/code]  

Now that we get the result list, we need to put the result list in the Custom page. We can say that this page is called a template:

 
 
  1. [Code]
  2. <! DOCTYPE html PUBLIC "-// W3C // dtd xhtml 1.0 Transitional // EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
  3. <Html xmlns = "http://www.w3.org/1999/xhtml">
  4. <Head>
  5. <Http-equivhttp-equiv = "Content-Type" content = "text/html; charset = gb2312"/>
  6. <Title> SuperSingo search-% title % </title>
  7. <Link href = "default/css/global.css" type = text/css rel = stylesheet>
  8. </Head>
  9. <Body>
  10. <Div id = "top">
  11. <Div id = "logo"> </div>
  12. <Div id = "searchUI">
  13. <Input type = "text" style = "width: 300px;"/>
  14. <Input type = "submit" value = "Search"/>
  15. </Div>
  16. <Div class = "clear"/>
  17. </Div>
  18. <Div id = "result_info">
  19. Job found: ××× records, time consumed ××× seconds
  20. </Div>
  21. <Div id = "result"> % result % </div>
  22. <Div id = "foot">

All the search structures here are from Baidu! Here, % title % and % result % are characters awaiting replacement. to replace these characters, we will add another method,


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.