Topic Center

Contact Sales

Home > Others

Backup csdn blog text to local archive continued

Last Update:2018-12-04 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

In the article "back up the csdn blog text to local archives", the blog post on csdn is backed up in hard-coded mode. The effect is good, but many people have encountered Encoding Problems, this is easier to solve. The so-called encoding problem is nothing more than two points. The first is that the encoding configuration of the browser is incorrect, which will lead to garbled characters when the file is opened and saved to a local place in the browser; the second is that the encoding configuration of the operating system is incorrect. This will cause garbled characters in the file name of the local article to be saved. Configure the two. The operating system I use is the Chinese version of Mac OS x 10.7, And the browser is Firefox. I have not encountered any encoding problems. However, to use safari, You need to manually set the encoding to gb2312 or gb18030.
In addition to coding, another problem is that it does not support other blogs. It is not difficult to solve this problem. In fact, the general framework in the source code has been determined, and the rest is just to define some labels and callback functions. This is very easy for Java, such as defining super classes like below:

abstract class Site {    public String atl_name;    public String atl_value;    public abstract int ParseIMG(NodeList nlist, int index);    public abstract int ParseTITLE(NodeList nlist, int index);    public abstract int ParseAUTHOR(NodeList nlist, int index);    public abstract int ParseMonthArticle(NodeList nlist, int index);    public abstract int ParsePerArticle(NodeList nlist, int index);    public abstract int ParsePAGE(NodeList nlist, int index);}

Then, make an adjustment similar to the following in the source code:

public static int parseImg(NodeList nlist, int index, Site st) {    return st. ParseIMG(nlist, index);}

At the same time, modify the logic of the handletext method. For example, the code similar to the following should be defined as different logics based on different subclasses:

if (node instanceof Div)

Finally, we can define different site subclasses specific to different websites and implement different abstract methods respectively. The key point here is that we have abstracted several elements for each blog website:
1. Blog title, such as "dawn of harvest"
2. Linked List archived by month
3. linked list of articles archived every month
4. Every article archived every month
5. Every image in each article
6. paging display processing for Category 1
As long as the above six elements can be processed, a super-class site is defined to process the above elements, and the corresponding methods can be implemented for different blog sites respectively. It should be noted that, currently, blogs that cannot be indexed by month are not supported.
The whole process is actually using htmlpaser to parse the page and take some actions to download it to the local device and save it according to certain rules.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Backup csdn blog text to local archive continued

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support