Import and Index Microsoft Word documents by using SQL Server

Source: Internet
Author: User
Tags microsoft sql server ole
server|word| Index

Q: I need to import Microsoft Word documents into SQL Server and index the documents so that they can be used in relational queries. How do I import and index documents?

A: SQL Server allows you to import Word documents in a variety of ways. Let's take a look at some of the most common methods. Note that before you import a document into SQL Server, you need to create an image data type column to hold the data. You can then use the Textcopy.exe command-line tool to read the image file into the database to complete the document import operation. If you need a basic documentation for the tool, type textcopy/? In the command prompt state. Another way to import Word documents into SQL Server is to write import code using the Microsoft ActiveX Directory Object (ADO) stream interface. You can find sample code in Microsoft Product Support Services (PSS) by using the ADO Stream object to access and modify SQL Server BLOB data.

In addition, you can move binary data to SQL Server. For a detailed description of this method, see the article using ADO to retrieve and update SQL Server text fields in PSS. Moving binary data allows you to store part of the data in a database, which is especially useful when you need to control the data format. For example, if you only need bytes between 1,000 and 1,010 of the data, importing binary data is much faster than using the ADO stream interface because the amount of data that SQL Server retrieves from disk is significantly reduced. This technique is commonly used to store bit masks, which are used to represent the opening or closing flags of an application.

SQL Server 2000 brings up sample code that describes how to move binary data. To view this code, simply select the \program Files\Microsoft SQL Server\80\tools\devtools\samples\ado Path on the drive where the code sample is installed on the SQL Server 2000 CD. Expand the executable file to locate the Samples subdirectory in the Visual Basic directory. In the employee example, notice how the code uses the FillDataFields () function.

If you want to index Word documents, both SQL Server 7.0 and SQL Server 2000 provide Full-text search components. This component blends a variety of techniques for indexing large text and image columns. When you perform a full-text search, you need to specify the file types that the image column contains, and the filters (filter) that are required to extract information from the binary data. For more information about using Full-text indexing, see the topics in SQL Server online books, and read the article by David Jones, published on the SQL Server Magazine website in July 2000, called Building a better search engine. Note that indexing a Word document does not automatically generate a set of relational tables that contain keywords in the document. However, the index file lets you include these Word documents in your search. Here are some possible ways to extract keywords from your data:

Use OLE to automate the reading of user-defined keywords from a document. These keywords are saved in the relational table when the document is loaded.
Use OLE to automatically process open documents and save them as text (. txt) format. To extract important words, you can use your own Word analyzer to traverse the text file. The text parser looks at each word in the document, removes some unwanted text, and stores each unique word with the number of occurrences.
Search for specific keywords in the new Full-text indexing document and enter them into the relational table.
SQL Server 2000 provides a number of powerful tools and interfaces to help users achieve fast loading, searching, and retrieval of binary Microsoft Office documents.



Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.