Using Python to back up Sina blogs and using python to back up Sina blogs

Source: Internet
Author: User

Using Python to back up Sina blogs and using python to back up Sina blogs

This article describes how to back up Sina Blog using Python. We will share this with you for your reference. The details are as follows:

Python2.7.2 is implemented, and it is recommended to run in IDE.

#-*-Coding: UTF-8-*-# ''' Created on 2011-12-18 @ author: ahan ''' import reimport sysimport osimport timeimport socketimport localeimport datetimeimport codecsfrom urllib import urlopen # Regular Expression definition # match the blog directory link pattern1 = u "<a href =" (http :. *?) "> Blog directory </a>" prog1 = re. compile (pattern1) # match the blog title link pattern2 = u "<a title = "(.*?) "Target =" _ blank "href = "(.*?) "> .*? </A> "" prog2 = re. compile (pattern2) # match the next page url pattern3 = u "<a href =" ([^ "] +) "title =" [^ "] +"> next page "" prog3 = re. compile (pattern3) # match the body part pattern4 = u "" <! -- Blog body begin --> [\ s \ S] *? <! -- Text ends --> "" prog4 = re. compile (pattern4) # match the body image link pattern5 = u "(src =" [^ "] +" (real_src = "([^"] + )")) "prog5 = re. compile (pattern5) def read_date_from_url (url): "" returns all data read from the url in Unicode format "try: data =" "request = urlopen (url) while True: s = request. read (1024) if not s: break data + = s return unicode (data) before T: print 'An error occurred while reading data 'print "Unexpected error:", sys. exc_info () [0], sys. exc_info () [1] return None finally: if request: request. close () def save_to_file (url, filename, blog_address): "url is the blog address, and filename is the file name to save, the default suffix is html "" # if the folder does not exist, create the folder if OS. path. exists (blog_address) = False: OS. makedirs (blog_address) # Remove the invalid character filename = ReplaceBadCharOfFileName (filename) file_no = 0 while OS. path. isfile (blog_address + '/'your filename}'.html') = True: filename = filename + '(' + file_no. _ str _ () + ') 'file_no + = 1 text = Read_date_from_url (url) text = _ filter (text) # Save the image to a local location result = prog5.findall (text) I = 1 for pic in result: folder = blog_address + '/' + filename + '/'pic_name+'image'{ I .{str}({{'.gif' if OS. path. exists (folder) = False: OS. makedirs (folder) try: url_file = urlopen (pic [2]) pic_file = codecs. open (folder + pic_name, 'wb ') while True: s = url_file.read (1024) if not s: break pic_file.write (s) pic_file.close () url _ File. close () failed T: print 'Oh, an error occurred while saving the image. Skip this image... 'print "Unexpected error:", sys. exc_info () [0], sys. exc_info () [1] else: print 'the image is saved successfully... '# Replace the image address text = text in the body. replace (pic [0], unicode ("src = \" "+ filename +"/"+ pic_name +" \ "" + pic [1]), 1) I = I + 1 blog_file = codecs. open (blog_address + '/'your filename}'.html', 'wb') blog_file.write (text) blog_file.close () # extract the text part def _ filter (t ): "extracts the body from the text and returns the Unic Ode string "result = prog4.search (t) if result is not None: return U' 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.