代碼筆記:使用C#轉移WordPress部落格到為知筆記或Evernote

來源:互聯網
上載者:User

 

註:文中的方法非常簡單,使用MySQL匯出表到XML,發送郵件到你的為知/Evernote郵箱賬戶即可,唯一目前沒找到破的是SMTP常隨機無響應.. 而且象GMAIL的SMTP有發送次數限制(一天一千左右吧,所以如果有超過的話可能需要更換帳號再試,當然你也可以改良下,做個數組,遇到配額錯誤可以提取下一個SMTP資訊),優點是利用郵件的方式可以將部落格中的圖片也能匯入到筆記中(而不是用的圖片原連結,是直接放進筆記)

日誌匯入Wiz的結果:

一、目的

很簡單,我擔心部落格放在網上有一天會不見了,所以一直想存到本地,部落格太多,手動已不可能

  1. 我這些年使用wordpress有上千篇部落格,其中有8成都是私人,使用現在的一些讀取FEED方式匯出的方式不能讀取私人和保護部落格,Wordpress也有匯出外掛程式,我沒有試太多,有一個匯出xml的工具採用流式試過十多次沒有一次是完整匯出,有時匯出100來K,有時幾M,最後放棄使用WP外掛程式匯出。
  2. 資料來源:使用MySQL匯出XML。
    首先原因是空間供應商不提供MySQL遠端連線,(如果你能匯出SQL在本地建MySQL再匯入,就不在這個話題範圍內)。我原本想匯出為SQL,再轉換為MSSQL,可在網上找的所有converter都是必須登陸遠程MySQL伺服器,這條路徑也行不通。匯出的CSV為亂碼,試過UTF8, GB2312都不行,這條路徑也行不通。最後發現只有匯出XML時沒有亂碼。這就簡單多了。
  3. 使用C#讀取XML時,有幾類重要的資訊是個人覺得必需的
    1. 部落格表 wp_posts
    2. 類別(category / tag等) 表 wp_term_taxonomy, wp_terms wp_term_relationships
      SQL:
      -- 列出全部日誌 --
      create view v_post as
      SELECT p.*, tax.term_taxonomy_id,tax.term_id,category.name, tax.count
      FROM
      wp_term_relationships relation,
      wp_terms category,
      wp_term_taxonomy tax,
      wp_posts p
      WHERE
      category.term_id = tax.term_id and
      tax.term_taxonomy_id = relation.term_taxonomy_id and
      relation.object_id =  p.id
二、操作步驟
  1. 登陸你空間的phpAdmin,選中日誌資料庫後,選擇Export,框選的是需要注意的,匯出時不要選擇匯出TABLE/VIEW等SQL語句。UTF-8必選否則會有亂碼

    得到匯出的XML將作為我們的資料來源。請一定確保下載下來的XML是有效(我剛試過,這一種也有可能會下載不完整,無解 /攤手)
  2. 跑日誌的運行結果
  3. 代碼
    using System;using System.Linq;using System.Net.Mail;using System.Text;using System.Text.RegularExpressions;using System.Threading.Tasks;using System.Xml.Linq;using System.Xml.XPath;namespace WordPressExport{    class Program    {        static SmtpClient smtpClient;        static MailMessage mailMessage;        static IOrderedEnumerable<post> list;        static bool smtpConnected;        static bool triedAfterEx = false;        readonly static string CONFIG_smtp_addr = System.Configuration.ConfigurationSettings.AppSettings["smtp_addr"];        readonly static string CONFIG_smtp_acct_name = System.Configuration.ConfigurationSettings.AppSettings["smtp_acct_name"];        readonly static string CONFIG_smtp_acct_pwd = System.Configuration.ConfigurationSettings.AppSettings["smtp_acct_pwd"];        readonly static string CONFIG_xml_path = System.Configuration.ConfigurationSettings.AppSettings["xml_path"];        readonly static string CONFIG_evernote_folder = System.Configuration.ConfigurationSettings.AppSettings["evernote_folder"];        readonly static string CONFIG_post_scope_start_date = System.Configuration.ConfigurationSettings.AppSettings["post_scope_start_date"];        readonly static string CONFIG_post_scope_end_date = System.Configuration.ConfigurationSettings.AppSettings["post_scope_end_date"];        readonly static string CONFIG_notebook_email = System.Configuration.ConfigurationSettings.AppSettings["notebook_email"];        readonly static string CONFIG_blog_addr = System.Configuration.ConfigurationSettings.AppSettings["blog_addr"];        static void SendMail(post post)        {            bool isSuccess = true;            if (post == null) return;            DateTime post_date;            if (!DateTime.TryParse(post.post_date, out post_date))            {                post_date = System.DateTime.MinValue;            }            string str_post_date = post_date == System.DateTime.MinValue ? "" : post_date.ToString("yyyy-MM-dd");            mailMessage.SubjectEncoding = Encoding.UTF8;            mailMessage.Subject = string.Format("[{0}] {1} {2}", str_post_date, post.post_title, CONFIG_evernote_folder);//主題             mailMessage.Body = "<b>建立時間:</b>" + post.post_date + "<br/>";//內容            mailMessage.Body += "<b>原目錄或tag</b>:" + post.post_tagcat + "<br/>";//內容            mailMessage.Body += string.Format("<b>原文見</b>:<a href={0}?p={1}>{0}?p={1}</a></a><br/><br/><br/>", CONFIG_blog_addr, post.ID);//內容            mailMessage.Body += post.post_content;            try            {                mailMessage.BodyEncoding = Encoding.UTF8;//本文編碼                 mailMessage.Priority = MailPriority.High;//優先級                 mailMessage.IsBodyHtml = true;                Regex reg = new Regex(@"\n");                mailMessage.Body = reg.Replace(mailMessage.Body, "<br/>");                Console.WriteLine(System.DateTime.Now + " sending mail... id = " + post.ID + " " + post.post_title);                // 你也可以使用非同步發送,不過會導致發到郵箱的時間很亂,我們還是希望匯入到筆記後的筆記自然順序(即ID)是按時間順序的                // smtpClient.Send(mailMessage);// 發送郵件             }            catch (SmtpException ex)            {                Console.WriteLine("failed in SMTP connection, try again...");                // System.IO.IOException: Unable to write data to the transport connection: An established connection was aborted by the                 // software in your host machine.                 // ---> System.Net.Sockets.SocketException: An established connection was aborted by the software in your host machine                 if (triedAfterEx)                {                    isSuccess = false;                    Console.WriteLine(ex.Message);                }                // again, create another smtp instance for reconnecting SMTP                // Parallel.Invoke(() => connectSmtp(), () => SendMail (post));                isSuccess = false;                Console.WriteLine(ex.Message);                triedAfterEx = true;            }            catch (Exception ex2)            {                isSuccess = false;                Console.WriteLine(ex2.Message);            }            Console.WriteLine(System.DateTime.Now.ToString() + (isSuccess ? " completed" : " completed with error(s): log's date" + post.post_date));//下次從這個時間戳記開始匯入        }        static void Main(string[] args)        {            smtpConnected = false;            Console.WriteLine(System.DateTime.Now + " start...");            Parallel.Invoke(() => connectSmtp(), () => HandleWPExport());            if (list.Count() == 0)            {                throw new Exception("ooops..");            }            int cnt = 0;            while (1 == 1)            {                if (smtpConnected = true || cnt > 10000) break;                cnt++;            }            Console.WriteLine(cnt);            foreach (var p in list)            {                triedAfterEx = false;                SendMail(p);            }        }        private static void connectSmtp()        {            smtpClient = new SmtpClient();            smtpClient.DeliveryMethod = SmtpDeliveryMethod.Network;            //指定電子郵件發送方式            smtpClient.Host = CONFIG_smtp_addr;            //指定SMTP服務器             smtpClient.EnableSsl = true;            smtpClient.Credentials = new System.Net.NetworkCredential(CONFIG_smtp_acct_name, CONFIG_smtp_acct_pwd);            smtpClient.Timeout = 100000;            mailMessage = new MailMessage(CONFIG_smtp_acct_name, CONFIG_notebook_email);        }        private static void HandleWPExport()        {            Console.WriteLine(System.DateTime.Now + " starting to read wp_posts.xml...");            XDocument xmlDoc = XDocument.Load(CONFIG_xml_path);            var queryTax = (from tax in xmlDoc.XPathSelectElements(@".//table[@name='wp_term_taxonomy']")                            where                                tax.XPathSelectElement("column[@name='taxonomy']").Value.Trim("\n".ToCharArray()) == TaxonomyEnum.category.ToString() ||                                tax.XPathSelectElement("column[@name='taxonomy']").Value.Trim("\n".ToCharArray()) == TaxonomyEnum.link_category.ToString() ||                                tax.XPathSelectElement("column[@name='taxonomy']").Value.Trim("\n".ToCharArray()) == TaxonomyEnum.post_tag.ToString()                            select new                            {                                term_id = tax.XPathSelectElement("column[@name='term_id']").Value.Trim("\n".ToCharArray()),                                tax_id = tax.XPathSelectElement("column[@name='term_taxonomy_id']").Value.Trim("\n".ToCharArray())                            }).ToList();            var queryCat = (from cat in xmlDoc.XPathSelectElements(@".//table[@name='wp_terms']")                            select new                            {                                term_id = cat.XPathSelectElement("column[@name='term_id']").Value.Trim("\n".ToCharArray()),                                name = cat.XPathSelectElement("column[@name='name']").Value.Trim("\n".ToCharArray()),                            }).ToList();            var queryRel = (from rel in xmlDoc.XPathSelectElements(@".//table[@name='wp_term_relationships']")                            select new                            {                                object_id = rel.XPathSelectElement("column[@name='object_id']").Value.Trim("\n".ToCharArray()),                                tax_id = rel.XPathSelectElement("column[@name='term_taxonomy_id']").Value.Trim("\n".ToCharArray()),                            }).ToList();            Console.WriteLine(System.DateTime.Now + " continuing ... ");            var queryTagCat = (from tax in queryTax                               from cat in queryCat                               from rel in queryRel                               where cat.term_id == tax.term_id && tax.tax_id == rel.tax_id                               select                               new                               {                                   name = cat.name,                                   id = rel.object_id                               }).ToList();            var query = from p in xmlDoc.XPathSelectElements(@".//table[@name='wp_posts' ]")                        where p.XPathSelectElement("column[@name='post_type']").Value.Trim("\n".ToCharArray()) == "post"  // there're two types - post, attachment, we don't want attachment type of posts                        select                            new post                            {                                ID = p.XPathSelectElement("column[@name='ID']").Value.Trim("\n".ToCharArray()),                                post_author = p.XPathSelectElement("column[@name='post_author']").Value.Trim("\n".ToCharArray()),                                post_date = p.XPathSelectElement("column[@name='post_date']").Value.Trim("\n".ToCharArray()),                                post_date_gmt = p.XPathSelectElement("column[@name='post_date_gmt']").Value.Trim("\n".ToCharArray()),                                post_content = p.XPathSelectElement("column[@name='post_content']").Value.Trim("\n".ToCharArray()),                                post_title = p.XPathSelectElement("column[@name='post_title']").Value.Trim("\n".ToCharArray()),                                post_excerpt = p.XPathSelectElement("column[@name='post_excerpt']").Value.Trim("\n".ToCharArray()),                                post_status = p.XPathSelectElement("column[@name='post_status']").Value.Trim("\n".ToCharArray()),                                comment_status = p.XPathSelectElement("column[@name='comment_status']").Value.Trim("\n".ToCharArray()),                                ping_status = p.XPathSelectElement("column[@name='ping_status']").Value.Trim("\n".ToCharArray()),                                post_password = p.XPathSelectElement("column[@name='post_password']").Value.Trim("\n".ToCharArray()),                                post_name = p.XPathSelectElement("column[@name='post_name']").Value.Trim("\n".ToCharArray()),                                to_ping = p.XPathSelectElement("column[@name='to_ping']").Value.Trim("\n".ToCharArray()),                                pinged = p.XPathSelectElement("column[@name='pinged']").Value.Trim("\n".ToCharArray()),                                post_modified = p.XPathSelectElement("column[@name='post_modified']").Value.Trim("\n".ToCharArray()),                                post_modified_gmt = p.XPathSelectElement("column[@name='post_modified_gmt']").Value.Trim("\n".ToCharArray()),                                post_content_filtered = p.XPathSelectElement("column[@name='post_content_filtered']").Value.Trim("\n".ToCharArray()),                                post_parent = p.XPathSelectElement("column[@name='post_parent']").Value.Trim("\n".ToCharArray()),                                guid = p.XPathSelectElement("column[@name='guid']").Value.Trim("\n".ToCharArray()),                                menu_order = p.XPathSelectElement("column[@name='menu_order']").Value.Trim("\n".ToCharArray()),                                post_type = p.XPathSelectElement("column[@name='post_type']").Value.Trim("\n".ToCharArray()),                                post_mime_type = p.XPathSelectElement("column[@name='post_mime_type']").Value.Trim("\n".ToCharArray()),                                comment_count = p.XPathSelectElement("column[@name='comment_count']").Value.Trim("\n".ToCharArray()),                                post_tagcat = string.Join(" ", (from t in queryTagCat where t.id == p.XPathSelectElement("column[@name='ID']").Value.Trim("\n".ToCharArray()) select t.name).ToList())                            };            Console.WriteLine(System.DateTime.Now + " done with reading the xml... ");            // PredicateBuilder            if (!string.IsNullOrEmpty(CONFIG_post_scope_start_date))                query = query.Where(o => Convert.ToDateTime(o.post_date) >= DateTime.Parse(CONFIG_post_scope_start_date));            if (!string.IsNullOrEmpty(CONFIG_post_scope_end_date))                query = query.Where(o => Convert.ToDateTime(o.post_date) <= DateTime.Parse(CONFIG_post_scope_end_date));            list = query.ToList().OrderBy(o => o.post_date);        }    }}

    DTO類

    using System;using System.Collections.Generic;using System.Linq;using System.Text;using System.Xml.Serialization;namespace WordPressExport{    [Serializable]    public class column    {        public string name { get; set; }        public string text { get; set; }    }    [Serializable]    public class database    {        [XmlElement(ElementName = "database")]        public List<table> tables { get; set; }    }    [Serializable]    public class table    {        public string name { get; set; }        public column column { get; set; }    }    [Serializable]    public class post    {        public string ID { get; set; }        public string post_author { get; set; }        public string post_date { get; set; }        public string post_date_gmt { get; set; }        public string post_content { get; set; }        public string post_title { get; set; }        public string post_excerpt { get; set; }        public string post_status { get; set; }        public string comment_status { get; set; }        public string ping_status { get; set; }        public string post_password { get; set; }        public string post_name { get; set; }        public string to_ping { get; set; }        public string pinged { get; set; }        public string post_modified { get; set; }        public string post_modified_gmt { get; set; }        public string post_content_filtered { get; set; }        public string post_parent { get; set; }        public string guid { get; set; }        public string menu_order { get; set; }        public string post_type { get; set; }        public string post_mime_type { get; set; }        public string comment_count { get; set; }        public string post_tagcat { get; set; }    }    public enum TaxonomyEnum    {        category,        link_category,        post_format,        post_tag,        series    }}

    app.config

    <?xml version="1.0"?><configuration><startup><supportedRuntime version="v4.0" sku=".NETFramework,Version=v4.0"/></startup>  <appSettings>    <add key="smtp_addr" value="smtp.gmail.com"/>    <add key="smtp_acct_name" value="@gmail.com"/>    <add key="smtp_acct_pwd" value=""/>    <add key="evernote_folder" value="@folder..."/>    <add key="xml_path" value="xxxxx\wp_posts.xml"/>    <add key="notebook_email" value="xxxx@mywiz.cn"/>    <add key="post_scope_start_date" value="2013-04-11 22:25:50"/>    <add key="post_scope_end_date" value=""/>    <add key="blog_addr" value="http://www.xxxx.com/"/>  </appSettings></configuration>
三、可改善的地方
  1. Mail body頂部,我加入了日誌詳情
  2. 我沒有將日誌放進tag,你可以改一下代碼使之subject上加入tag (for evernote, 加入#tag1 tag2等,自行查下規則)
  3. 唯一目前沒找到破的是SMTP常隨機無響應.. 你如果找到可以破的(比如CATCH到IOExeption重連的),麻煩告訴我
  4. GMAIL的SMTP有發送次數限制(一天一千左右吧,所以如果有超過的話可能需要更換帳號再試,當然你也可以改良下,做個數組,遇到配額錯誤可以提取下一個SMTP資訊)
四、關於為知Wiz
  1. 個人覺得Wiz是個還有很多提升空間的、自訂性比較強的國產筆記類軟體,記得用Wiz的話一定要注意給你不願意別人看到的條目加個密,不然按Wiz的目錄式的HTML檔案在電腦中完全沒有秘密可言,知道目錄的無需登陸就能打加所有日誌。

  2. 沒有Wiz帳號的,在這順便遞送個邀請碼吧,6d485186 (http://www.wiz.cn/i/6d485186),據說用了註冊後第一個月有VIP試用,當然你不用也行。
  3. 比較起Evernote來說,好處就是日誌的空間比較大,外掛程式化讓它比較有趣味。 比如它可以匯出CHM,可以把以往的日誌全部轉成一個漂亮的CHM。它的搜尋體驗我覺得比Evernote更好,全文檢索索引的index使得搜尋特別快。再多的就不介紹了,敢興趣的可以自行瞭解。
相關文章

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.