Python implementation login know to get personal favorites and save as Word file _python

Source: Internet
Author: User
Tags urlencode

This program is actually completed very early, has not been issued, while not very busy recently to share.
Using the BeautifulSoup module and the Urllib2 module, and then saving it as word is using the Python docx module, the installation method on the Internet a lot, I will not repeat.

The main implementation of the function is to log in, then save the questions and answers for your personal favorites to a Word document so that you can refer to them when there is no network. Of course, if there is a picture in the answer can also be obtained. But this one is still a bit of a problem. There is time to revise it later.

There is a regular, with a simply do not too bad ... Despise oneself ...

And, now that's the question, all the answers will be preserved. See if there is time to change the answer to only the first answer or the question of collecting pages. Otherwise, if you have too many collections, Word will scare you. O (∩_∩) o haha ~

When the login may need to verify the code, if prompted to enter the verification code in the program folder below can see the image of the verification code, according to the input on the OK.

#-*-Coding:utf-8-*-#登陆知乎抓取个人收藏 then save as word import sys reload (SYS) sys.setdefaultencoding (' utf-8 ') import urllib Impor T urllib2 import cookielib Import string Import re from BS4 import BeautifulSoup to docx import Document from docx impor 
T * from docx.shared import inches from sys import exit import OS #这儿是因为在公司上网的话需要使用socket代理 #import socks #import socket #socks. Setdefaultproxy (socks. PROXY_TYPE_SOCKS5, "127.0.0.1", 8088) #socket. Socket =socks.socksocket loginurl= ' http://www.zhihu.com/login ' headers = {' user-agent ': ' mozilla/5.0 (Windows NT 6.1; WOW64) applewebkit/537.36 (khtml, like Gecko) chrome/34.0.1847.116 safari/537.36 ', postdata={' _xsrf ': ' Acab9d276ea21 7226d9cc94a84a231f7 ', ' email ': ', ' Password ': ', ' rememberme ': ' Y '} If not os.path.exists (' myimg '): Os.mkdir (' m Yimg ') If os.path.exists (' 123.docx '): Os.remove (' 123.docx ') if os.path.exists (' Checkcode.gif '): Os.remove (' Checkcode . gif ') mydoc=document () questiontitle= ' #----------------------------------------------------------------------def dealimg (imgcontent): Soup=beautifulsoup (imgcontent) try:for Imglink In Soup.findall (' img '): If Imglink isn't none:myimg= imglink.get (' src ') #print myimg if
          Myimg.find (' http ') >=0:imgsrc=urllib2.urlopen (myimg). Read () imgnamere=re.compile (R ' http\s*/')
            Imgname=imgnamere.sub (', myimg) #print imgname with open (U ' myimg ' + '/' +imgname, ' WB ') as code: Code.write (IMGSRC) mydoc.add_picture (U ' myimg/' +imgname,width=inches (1.25)) Except:pass Strinf O=re.compile (R ' <noscript>[\s\S]*</noscript> ') imgcontent=strinfo.sub (', imgcontent ') strinfo= Re.compile (R '  ') imgcontent=strinfo.sub (', imgcontent ') #show all Strinfo=re.compile (R ' & Lt;a class= "toggle-expand[\s\s]*</a>") imgcontent=strinfo.sub (', imgcontent) strinfo=re.compile (R ' <a class= "Wrap external" [\s\s]*rel=] nofollow NoReferrer "target=" _blank ">") imgcontent=strinfo.sub (', Imgcontent ') imgcontent=imgcontent.replace (' <i class= ') Icon-external "></i></a>", "Imgcontent=imgcontent.replace" (' </b> ', '). Replace (' </p> ') , '). Replace (' <p> ', '). Replace (' <p> ', '). Replace (' <br> ', ') return imgcontent def Enterque
  Stionpage (Pageurl): Html=urllib2.urlopen (Pageurl). Read () soup=beautifulsoup (HTML) questiontitle=soup.title.string Mydoc.add_heading (questiontitle,level=3) for Div in Soup.findall (' div ', {' class ': ' Fixed-summary zm-editable-content Clearfix '}): #print div conent=str (div). replace (' <div class= ' fixed-summary zm-editable-content ' > ', '
     
    '). Replace (' </div> ', ') conent=conent.decode (' Utf-8 ') conent=conent.replace (' <br/> ', ' \ n ') Conent=dealimg (conent) # # #这一块弄得太复杂了 have time to find out if there is a module to process HTML Conent=conent.replace (' <div class= ' Fixed-summary-mas K ">". Replace (' <blockquote> replace (' <b> ', '). Replace (' <strong> ', '). Replace (' </strong> ', '). Replace (' <em> ' , '). Replace (' </em> ', '). Replace (' </blockquote> ', ') mydoc.add_paragraph (conent,style= ' BodyText3 ') " "" File=open (' 222.txt ', ' a ') file.write (str (conent)) File.close () "" Def Entercollectpage (pageurl): Html=u Rllib2.urlopen (Pageurl). Read () Soup=beautifulsoup (HTML) for Div in Soup.findall (' div ', {' class ': ' Zm-item '}): H2cont Ent=div.find (' H2 ', {' class ': ' Zm-item-title '}) #print h2content If h2content is not none:link=h2content.find (
      ' A ') mylink=link.get (' href ') quectionlink= ' http://www.zhihu.com ' +mylink enterquestionpage (Quectionlink) Print Quectionlink def loginzhihu (): Postdatastr=urllib.urlencode (postdata) ' CJ = Cookielib. Lwpcookiejar () Cookie_support = Urllib2. Httpcookieprocessor (CJ) opener = Urllib2.build_opener (cookie_support,urllib2. HttpHandler) Urllib2.install_opener (opener) ' H = urllib2.urlopen (loginurl) request = Urllib2. Request (loginurl,postdatastr,headers) request.get_origin_req_host response = Urllib2.urlopen (request) #print respons
  E.geturl () Text = Response.read () collecturl= ' http://www.zhihu.com/collections ' Req=urllib2.urlopen (collecturl)
    If Str (req.geturl ()) = = ' Http://www.zhihu.com/?next=%2Fcollections ': print ' login fail! '  Return Txt=req.read () soup=beautifulsoup (TXT) count=0 divs =soup.findall (' div ', {' class ': ' Zm-item '}) if DIVs is
    None:print ' Login fail! ' Return print ' login ok!\n ' for div in Divs:link=div.find (' a ') mylink=link.get (' href ') collectlink= ' 
    Http://www.zhihu.com ' +mylink entercollectpage (collectlink) print Collectlink #这儿是当时做测试用的, Value gets a collection #count +=1 #if count==1: # return def getcheckcode (thehtml): Soup=beautifulsoup (thehtml) div=soup.find (' div ', {'
    Class ': ' Js-captcha captcha-wrap '} ' if Div is not None: #print DivImgsrc=div.find (' img ') imglink=imgsrc.get (' src ') if Imglink is not none:imglink= ' http://www.zhihu.com ' +IMGL Ink Imgcontent=urllib2.urlopen (imglink). Read () with open (' Checkcode.gif ', ' WB ') as Code:code.write (i 
  Mgcontent return True Else:return false return False if __name__== ' __main__ ': Import getpass
  Username=raw_input (' Input username: ') password=getpass.getpass (' Enter password: ') postdata[' email ']=username postdata[' Password ']=password postdatastr=urllib.urlencode (postdata) CJ = Cookielib. Lwpcookiejar () Cookie_support = Urllib2. Httpcookieprocessor (CJ) opener = Urllib2.build_opener (cookie_support,urllib2. HttpHandler) Urllib2.install_opener (opener) H = urllib2.urlopen (loginurl) request = Urllib2. Request (loginurl,postdatastr,headers) response = Urllib2.urlopen (request) txt = Response.read () if Getcheckcode (TX T): checkcode=raw_input (' input checkcode: ') postdata[' Captcha ']=checkcoDe Loginzhihu () mydoc.save (' 123.docx ') Else:loginzhihu () mydoc.save (' 123.docx ') print ' The end ' r Aw_input ()

Well, this is probably the case, if you have any good suggestions or what can be the following message, I will reply as soon as possible. Or in the station on the page has my contact information, direct contact me OK.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.