Knowledge management system data solution R & D diary 2 Application Series

Source: Internet
Author: User

In the previous articleArticleIn the scenario design and requirements list in the data solution R & D diary of the knowledge management system, many requirements are mentioned. This article describes the software required to complete these functions. TheseProgramHave been written, completed in C #. NET and SQL Server 2005.

Let's take a look at the software data solution system. Data solution consists of two major solutions: file solution and segment solution. The former is a document solution, and the latter is a segment data solution. There are a lot of applications. Let's take a look at them.

Data Loader

This is a plug-in framework.CodeIs open source code. You can get the code through the "Management console tool management software general development framework (Open Source Code. Shows the main interface effect.

The TAS List window shows that Data Loader consists of 15 applications.

User ID blog, URL blog, text blog, and default blog. These four applications are used for webpage download processing. In the article "Have I established my own development knowledge base for. Net programmers who have been working for many years? I also share some of the descriptions in my e-book production experience, which are used to process webpages. I will not repeat them here.


Bath import is used to scan specified paths and import documents to the database system.

Doc export is used to scan the specified path and import it to the database system. Unlike bath import, you can specify the file format to be processed.

Only the DOC format in the specified directory e: \ document \ tecnhologies is processed.

The keyword path is acceptable for the doc repository. For example, "all" indicates all local disks, and "C" indicates only all the folders of a disk on drive C. The last one is shown in. Select the specified path.


The PDF watcher monitors the specified folder. If a PDF file is changed, convert it and import it to the database.


Document explorer is used to display the Documents scanned by Doc. You can delete or sort the documents if you do not need them.

Select one or more documents on the left, right-click category document,

Open the document browser program, right-click the required category node, and choose paste document.

When you open the document browser again, you can see that the document has been placed in the specified type.

Double-click a file item in the document view to open the View window, as shown in the result.

This is a read-only application that can only be viewed and cannot be edited. The editor tool is introduced later to edit documents in the database.


The doc loader program imports the specified file into the database. Note that it processes only one specified file at a time, rather than the entire directory.


Database cleanup is used to clear the database. During testing programs or exercises, the size of the database file MDF/LDF is usually full to several GB. Therefore, a tool is required to clean it and clear unnecessary databases. This function should be used with caution in practical applications. An option will be mentioned later to display this function, which is usually hidden and not displayed.

The above programs process, convert, and put the files into the database. The following describes the program for processing segment data.


Form Designer

The biggest difference between segment data is that its format is flexible and custom. As mentioned in the previous article, the recruitment information is divided into four fields: company name, job requirements, technical requirements, and contact information. For example, IT companies can quickly query databases, the format is company name, COMPANY Introduction, address, comment, portal website, a total of five fields, which are flexible and can be changed at any time.

To capture data in this format, you can use a regular expression to customize the form to display data in this format, therefore, it is necessary to introduce the Form Designer to flexibly adapt to different data formats.

Let's take a look at the segment explorer interface in Data Loader. It parses the Form Design File above and displays it.

The design, display, and binding of forms are complex topics. This is detailed in the following articles.


Rule editor is used to capture irregular data on the Internet. Any regular webpage data can be downloaded to the local machine in the background by creating rules.

To put it bluntly, this is a regular expression-based data collection tool that focuses on collection, while segment run is a background program that analyzes rules and downloads data to a local disk, form Designer focuses on displaying data. These three tools work seamlessly to capture Internet data.


Site rebuild is used to download web pages to the database system. This is a special program dedicated to downloading articles from some good sites, so you don't have to worry about the changes or upgrades of these sites, the search is inconvenient. After all, data is stored on your computer, which is the most reassuring data.

The cnblogs obtains data based on the blogger ID, which is ranked first in the ranking and the quality of the article is assured. Codeproject obtains data by category, and its quality is also good. msdn, csdn, and dotblogs see good articles, copy the address, or copy an HTML source code, it automatically parses the article path and downloads it to the local SQL Server database.



You can edit local files or documents in the database system. Supports the format DOC/doxc and RTF, which are used to edit and sort documents stored in the SQL Server database system. With this function, the system availability is greatly enhanced.

Click the OPEN button in the editor to open the file browsing dialog box. You can open files on the local disk or files on the SQL Server server, as shown in, the structure of the server node is exactly the same as that of the document browser. Select the file and click open.

This effect is similar to the WordPad and has similar functions. Open the files in the database. After editing and sorting, they are naturally saved to the SQL Server database. Of course, you can also use the export function to export the disk files of the cost server, the supported formats are DOC/docx/pdf/RTF.


as you can see, data solution is a combination of programs to make it powerful in document scanning, importing, editing, and classification. Permission judgment, document classification refinement, full-text retrieval of documents, document version management, and document workflow are also omitted. if the content is added, the amount of code in this system is, the workload is more than doubled. The finer the performance, the more powerful the function, and more cost and patience are required. Suggestions for improvement are welcome.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.