Three solutions for searching word content in combination with SQL Server full-text search

Source: Internet
Author: User

In addition to using the APIS provided by office to search for Word documents, this article briefly summarizes three solutions for searching Word files based on SQL Server's full-text retrieval technology.

1. Full-text search with the Windows Index Service

Summary:

1. Change the file name during file storage;

2. Configure the Indexing Server and associate the Indexing Server with the ms SQL Server.

3. Modify the SQL statement and add the content of the full-text query statement to the query condition.

A detailed example, refer to here: http://database.ctocio.com.cn/51/11440551.shtml

Advantages:Files can be physically stored in directories, and these files are stored in DOC format.

Disadvantages:It can only be read and cannot be written.

 

2. Full-text search with BLOB Data

Solution Abstract: store the DOC file in the database table in BLOB Data Format varbinary (max), and then perform full-text search on the table. This is the most common solution.

 

 

An example of simple table insertion:

------- Binary file query example <br/>/********************** 3w@live.cn ******** * *******/<br/> use master <br/> go <br/> If exists (Select name from sys. databases where name = n' blobdatademodb ') <br/> drop database blobdatademodb <br/> go <br/> use master <br/> go <br/> Create Database blobdatademodb <br/> go <br/> -------- enable full-text search <br/>/********************* 3w@live.cn ********** * *****/<br/> execute sp_fulltext_database 'enable' <br/> go <br/> Use blobdatademodb <br/> go <br/> -- create an inclusion blob column table <br/>/*********************** 3w@live.cn ********* * ******/<br/> If object_id ('sampleblobtable ') is not null <br/> drop table sampleblobtable <br/> go <br/> Create Table sampleblobtable <br/> (<br/> [pkid] int identity (1, 1) primary Key, <br/> [filetype] nvarchar (32) null, <br/> [filename] nvarchar (255) null, <br/> [filecontent] varbinary (max) null, <br/> [addtime] datetime default (getdate () <br/> go <br/> If exists (select * From sys. objects where object_id = object_id (n' [DBO]. [cpp_insertoneblobdatatotable] ') and type in (n'p', n'pc') <br/> drop procedure [DBO]. [cpp_insertoneblobdatatotable] <br/> go <br/> -- creates a stored procedure for inserting data to SQL Server. <br/> /************ */<br/> Create procedure cpp_insertoneblobdatatotable <br/> (@ filetype nvarchar (32 ), <br/> @ filename nvarchar (255), <br/> @ filecontent varbinary (max) <br/>) <br/> as <br/> insert sampleblobtable ([filetype], [filename], [filecontent], [addtime]) <br/> values (@ filetype, @ filename, @ filecontent, getdate () <br/> go

Using system; <br/> using system. collections. generic; <br/> using system. LINQ; <br/> using system. text; <br/> using system. io; <br/> using system. data. sqlclient; <br/> using system. data; <br/> namespace blobdatasearchdemo <br/>{< br/> class Program <br/>{< br/> const string conn = @ "Server = workshop/agronet09; database = blobdatademodb; uid = sa; Pwd = As; "; <br/> static void main (string [] ARGs) <br/>{< br/> savedoc2sqlserver (@ "D:/2008 data/streamdata/doc/dancing. Doc", Conn); <br/> savedoc2sqlserver (@ "D: /2008 data/streamdata/doc/tianlong Babu .doc ", Conn); <br/> savedoc2sqlserver (@" D:/2008 data/streamdata/doc/english.doc ", Conn ); <br/> console. readkey (); <br/>}< br/> Private Static void savedoc2sqlserver (string filepath, string conn) <br/>{< br/> fileinfo Fi = new fileinfo (filepath); <br/> If (Fi. exists) <br/>{< br/> // open the stream and read it back. <br/> using (filestream FS = file. openread (filepath) <br/>{< br/> byte [] B = new byte [fi. length]; <br/> sqlconnection conn; <br/> sqlcommand cmduploaddoc; <br/> utf8encoding temp = new utf8encoding (true); <br/> while (FS. read (B, 0, B. length)> 0) <br/>{< br/> conn = new sqlconnection (conn ); <br/> // setting the sqlcommand <br/> cmduploaddoc = new sqlcommand ("cpp_insertoneblobdatatotable", Conn); <br/> cmduploaddoc. commandtype = commandtype. storedprocedure; <br/> cmduploaddoc. parameters. add ("@ FILENAME", sqldbtype. nvarchar, 200 ). value = Fi. name; <br/> cmduploaddoc. parameters. add ("@ filecontent", sqldbtype. varbinary, 0 ). value = B; <br/> cmduploaddoc. parameters. add ("@ filetype", sqldbtype. nvarchar, 32 ). value = Fi. extension. replace (". "," "); <br/> Conn. open (); <br/> cmduploaddoc. executenonquery (); <br/> Conn. close (); <br/>}< br/>

Query results:

Note:

1. When you set full-text search for a database, one field must be of the document type. the SQL Server full-text search starts the corresponding Doc engine for retrieval based on this document type.

2. You must set the language for full-text search. The Chinese version is 2052, and the English version is 1033.

Advantages:Import the DOC file into the SQL Server database for easy reading and full-text retrieval. If necessary, the file can also be written.

Disadvantages:Varbinary (max) is limited by the size of 2 GB, and database storage of a large amount of BLOB data will become abnormally bloated, and the retrieval speed will be greatly reduced.

3. Full-text search with filestream

Solution Abstract: similar to solution 2, only the filestream technology is used to store the DOC file in a physical file outside the database in the data format varbinary (max), and then perform full-text search on the table.

In SQL Server 2008
-Filestream data does not support remote storage.
-Database snapshots and database images are not supported.
-Some Katmai functions do not support filestream, for example:
• SQL Encryption
• Table value Parameters

Prerequisites: You must install full-text retrieval and enable filestream.

Refer:

Http://msdn.microsoft.com/zh-cn/library/bb933993.aspx

Http://www.cnblogs.com/downmoon/archive/2010/05/06/1727546.html

Http://www.cnblogs.com/downmoon/archive/2010/05/08/1730044.html

Advantages:Importing the DOC file into the SQL Server database allows you to easily read and retrieve the full text. You can also write the file if necessary, and overcome the disadvantages of solution 2. The varbinary (max) field only stores indexes, while the actual content is stored outside the database. The size is limited only by the physical size of the NTFS folder.

 

Summary: This article briefly summarizes how to combine the full-text retrieval technology of SQL Server to search the content of Word files. I think both solution 1 and solution 3 can be implemented. Welcome to the discussion. 3w@live.cn

 

 

Invitation month Note: The copyright of this article is jointly owned by the invitation month and csdn. For more information, see the source.
Helping others is the same as self-help! 3w@live.cn

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.