Previous two articlesArticleData solution design goals and applications designed to achieve these goals have been presented.ProgramThis article further describes how data solution scans, converts, stores, and retrieves files.
The file formats of local disks are usually DOC/docx, PDF, PSt/Ost/EML, htm/MHT, TXT/RTF, which are commonly used in computers. To edit and sort them in the same editor, you need to select a common format to convert other formats to this common format, edit the format. The doc/docx format is powerful and has many open-source database types to read and write it. The RTF format also contains a wealth of data. The important thing is the RTF format, which is an open format. You can download Microsoft Office Word 2003 Rich Text Format (RTF) specification from Microsoft website to familiarize yourself with this file format. One of the advantages of the open format is that many existing functions andCodeIncluding open source code. Therefore, data solution uses the RTF format as the standard format for file storage.
The next goal is to find the editing tool for the RTF format. Google can find many editors for the RTF format to directly edit, store, and convert the RTF format. Because it is an open format, Microsoft's. NET Framework comes with RichTextBox, which can be used for editing the RTF format. However, this control still needs to be reinforced. You can find some tools and articles on codeproject.com.
The two key steps are solved. The following is the design of the database and the design of data read/write code. Create a document database. The script for the document table is as follows:
Create Table [DBO]. [document] ([recnum] [ Int ] Identity (1, 1) Not Null , [Subject] [nvarchar] (2000) Null , [Body_type] [nvarchar] (50) Null , [Body] [nvarchar] ( Max ) Null , [Create_date] [datetime] Null , [Create_by] [nvarchar] (50) Null , [Revised_date] [datetime] Null , [Revised_by] [nvarchar] (50) Null , [Category] [Int ] Null , [Computer] [nvarchar] (200) Null ,[ Path ] [Nvarchar] (2000) Null , Constraint [Pk_document] Primary Key Clustered ([Recnum] ASC ) With (Pad_index = Off , Statistics_norecompute = Off , Ignore_dup_key = Off , Allow_row_locks =On , Allow_page_locks = On ) On [ Primary ]) On [ Primary ]
Body is the table field for storing documents. It is designed to be nvarchar (max) type. category is used for classification search of documents. subject can store keywords or titles. body_type is used for full-text search, file Type Extension. The computer and path are used for local scanning. The path of the original document is specified and can be used for tracking.
The data solution system selects llbl Gen framework as the code generator at the database access layer to generate the solution code, as shown in figure
Let's take a look at the code saved in the document. It is the routine code of the standard llbl Gen framework.
Public documententity savedocument (documententity DOC) { using (dataaccessadapter adapter = getdataaccessadapter () { try {adapter. starttransaction (isolationlevel. readcommitted, "savedocument" ); adapter. saveentity (Doc, true , false ); adapter. commit () ;}< SPAN class = "kwrd"> catch {adapter. rollback (); throw ;}< SPAN class = "kwrd"> return Doc ;}
The so-called routine code is the code that can be generated by the template, just like punching in and out of work, which is very common and simple.
Note that the throw code is not written as this
Catch(Exception ex) {Adapter. rollback ();ThrowEx ;}
In the book ". NET Framework Program Design", I explained the difference between the two throws, which will generate different stack traces, with different starting points.
After the problem is solved at the basic layer, the following applications can import documents to the database in different ways.
Batch import batch imports files in the specified directory to the database
Doc batch import files in the specified format to the database
PDF watcher is used to convert and import PDF file formats. Because it is watcher, you will surely think of it as filesystemwatcher.
Doc loader is applicable to the conversion and import of a single document. Only one document can be processed at a time.
Let's take a look at the document presentation in the database. Document Explorer displays the original files imported into the database. Here, you can preview, delete, and classify the files. After classification, this document is like a qualified tag, which can be further used in future programs. Otherwise, files that are not classified will only stay here, and subsequent steps cannot be processed. This is the policy in the document process. If you do not like this step, you can remove it.
Document browser category view document
The tree structure is on the left and the document under this category is on the right. If you want to classify the files scanned into the database, you can perform this operation.
Select one or more files in document explorer and right-click category document
In the left-side tree of document browser, right-click paste document
Then we can see the effect. Under the. NET node, the document and attributes of the paste are displayed.
The tree on the left of the document browser is the document category. You can add a subnode to it. The effect of adding a new category is as follows:
Category is taken from the category table in the database. Its script definition is as follows:
Create table [DBO]. [category] ([recnum] [ int ] identity (1, 1) not null , [name] [nvarchar] (200) null , constraint [pk_category] Primary key
clustered ([recnum] ASC ) with (pad_index = off , statistics_norecompute = off , ignore_dup_key = off , allow_row_locks = On , allow_page_locks = On ) On [ Primary ]) On [ Primary ]
This document classification tree is structured and defined in a local XML file. The format of the tree shown in is defined as follows:
As shown in, tag is the classification definition obtained from the database. Based on this identifier, the document of the classification is retrieved. When the document browser form is opened or closed, it will define the structure of the persistence tree. The Code is as follows:
protected override void onclosed (eventargs E) { base . onclosed (E); treeviewserializer serializer = New treeviewserializer (); serializer. serializetreeview ( This . treeview, treefile );} protected override void onload (eventargs E) {Treeview. imagelist = This . imagelist; treenode root = Treeview. nodes [0]; If (file. exists (treefile) {Treeview. nodes. clear (); treeviewserializer serializer = New treeviewserializer (); serializer. deserializetreeview ( This . treeview, treefile); Treeview. expandall () ;}
If you are interested in how to save the tree node definition to the file system, you can use the treeviewserializer keyword to search for it in the codeproject. The code here is the code from one of the articles.
In another place, tree structure classification is used. In the open file dialog box of the editor, as shown in
The effect here is exactly the same as that of document browser. It turns out that the place m open file dialog can be used to solve the issue of the place bar in the open file dialog box, that is, the classification of the tree on the left, which is also surrounded by a red border.
To override the red area, Windows has a specified path and will verify the path. This method fails, but the custom dialog box is used. SharePoint designer has rewritten the place bar area for the software I 've seen before. Unfortunately, I didn't understand how it works.
Writing this article is not complete yet. Here, you still need a bit of second office development knowledge. In office software, write a plug-in to view the documents that are being viewed, import the file directly to my document database, as shown in
Two plug-ins, nitro PDF professional and Acrobat, have been installed to convert the current DOC/docx document to a PDF file. Therefore, you also need to write a plug-in that transfers the current DOC/docx file to the document database.
there are a lot of troubles when converting PDF files into editable rtf formats. Convert DOC/docx to PDF. Many Source Code can be borrowed, converting a PDF file to a doc/docx component is hard to find. One solution is to convert a PDF file into a TIFF file, and then convert it into a DOC/docx file using the OCR Software System (abbyy finereader 9. We are a programmer, and we don't have much money to buy the expensive SDK license. We turn to crack or patch, and there is no result. These industry-leading technologies, even trial versions, won't appear on the Internet, and there is no chance to trial them, or put them in virtual machines, and they will always be used in trial mode. Some components, such as PDF focus. net, can be used for a trial. However, the converted files must be either added with a trial watermark or only the first three pages can be converted, and none of the following can be converted. Alternatively, some ActiveX controls are registered to the system using OCX. Think about the dependency on Microsoft. NET Framework and SQL server in the future. From this experience, we can't go through this day. Although I have solved all the problems here, I still don't want to face them. In the field of Non-database development, technology and knowledge are very valuable. Because we are used to development in the database field, technology and knowledge are worthless. Only products made are valuable and tragic.