Previous ArticleArticle"Sharing well-made knowledge management system blog backup program site rebuild" has mentioned how to use site rebuild to download your favorite blog articles, however, the downloaded files cannot be imported into the database, and the file data on the server cannot be browsed online. This article helps you create a document database and download, store, and browse blog articles.
Open Data LoaderProgramRun the setting program. The interface is displayed as follows:
Click the button after connectionstring to enter the correct database configuration, as shown below
As shown in, click OK to return to the main program form.
Open the SQL Server database management program, create a new database named document, and run the following SQL script to create a table structure.
/***** Object: Table [DBO]. [category] script Date: 11/04/2011 10:27:21 ******/ Set Ansi_nulls On Go Set Quoted_identifier On Go Create Table [DBO]. [category] ([recnum] [ Int ] Identity (1, 1) Not Null , [Name] [nvarchar] (200) Null , Constraint [Pk_category] Primary Key Clustered ([Recnum] ASC ) With (Pad_index = Off , Statistics_norecompute =Off , Ignore_dup_key = Off , Allow_row_locks = On , Allow_page_locks = On ) On [ Primary ]) On [ Primary ]/****** Object: Table [DBO]. [document] script Date: 11/04/2011 16:46:37 ******/ Set Ansi_nulls On Go Set Quoted_identifier On Go Create Table [DBO]. [document] ([recnum] [ Int ] Identity (1, 1) Not Null , [Subject] [nvarchar] (2000) Null , [Body_type] [nvarchar] (50) Null , [Body] [ntext] Null , [Create_date] [datetime] Null , [Category] [ Int ] Null , [Create_by] [nvarchar] (50) Null , [Computer] [nvarchar] (200)Null ,[ Path ] [Nvarchar] (2000) Null , Constraint [Pk_document] Primary Key Clustered ([Recnum] ASC ) With (Pad_index = Off , Statistics_norecompute = Off , Ignore_dup_key = Off , Allow_row_locks = On , Allow_page_locks = On )On [ Primary ]) On [ Primary ] Textimage_on [ Primary ]/***** Object : Table [DBO]. [settings] script Date : 11/07/2011 00:04:28 ******/ Set Ansi_nulls On Go Set Quoted_identifier On Go Create Table [DBO]. [settings] ([recnum] [ Int ] Identity (1, 1) Not Null , [Downloaded_path] [nvarchar] (800) Null , [Connection_string] [nvarchar] (800) Null , [Failed_cleanup_file] [nvarchar] (400) Null , Constraint [Pk_settings] Primary Key Clustered ([Recnum] ASC ) With (Pad_index =Off , Statistics_norecompute = Off , Ignore_dup_key = Off , Allow_row_locks = On , Allow_page_locks = On ) On [ Primary ]) On [ Primary ] Go
Insert DBO. settings (downloaded_path, connection_string, failed_cleanup_file) values ('G: \ document', -- Downloaded_path-nvarchar (800) null, -- connection_string-nvarchar (800) null -- failed_cleanup_file-nvarchar (400 ))
The last script is used to create configuration data options. Open the setting program again and the contents of this table will be displayed.
This is all the actions you need to do. According to the previous article, you can easily achieve the following results:
The articles in are taken from the design pattern series of terrylee to form a system and put them together for reading, which is very convenient.
Download the latest data loader program from epn.codeplec.com (http://epn.codeplex.com/releases/view/68647) to experience Reading Blogs offline. All articles and data documents are stored on your computer. You can edit, process, refine, and learn them.
So far, Data Loader still needs to be improved.
1. A short article usually containsCodeDownload, automatically download the document, but also want to download its attachments.
2. Enhance the search capability and create a new index builder program to search for the documents you need in a massive document database.
3. For document download and import, you need to add a progress bar (progress bar) to display the current status. This will make the interface more friendly.
4. Added multi-threaded processing capabilities in analysis, download, and import modules to increase processing speed.
5. Add new applications, such as PPT download, to search for the correct PPT data. Image download is used to download images from a website. For example, you often see some very good interfaces, so you think of this batch method to obtain the data and focus on the uidesign.
6. compression and decompression. For articles with the same content, the RTF format is generally three to four times the size of the DOC format, leading to the consumption of hard disk space by the RTF format. Take my local machine as an example. The total number of 2723 DOC files is 745 MB, while the RTF format reaches 5 GB, Which is saved to the database. The size of the database increases to about 8 GB. We need to compress the documents in the icsharpcode. sharpziplib. dll and zip formats that we are familiar.
I hope to help you. You are welcome to give your valuable comments.