In addition to using the APIS provided by office to search for Word documents, this article briefly summarizes three solutions for searching Word files based on SQL Server's full-text retrieval technology.
1. Full-text search with the Windows Index Service
Summary:
1. Change the file name during file storage;
2. Configure the Indexing Server and associate the Indexing Server with the ms SQL Server.
3. Modify the SQL statement and add the content of the full-text query statement to the query condition.
A detailed example, refer to here: http://database.ctocio.com.cn/51/11440551.shtml
Advantages:Files can be physically stored in directories, and these files are stored in DOC format.
Disadvantages:It can only be read and cannot be written.
2. Full-text search with BLOB Data
Solution Abstract: store the DOC file in the database table in BLOB Data Format varbinary (max), and then perform full-text search on the table. This is the most common solution.
An example of simple table insertion:
------- Binary file query example <br/>/********************** 3w@live.cn ******** * *******/<br/> use master <br/> go <br/> If exists (Select name from sys. databases where name = n' blobdatademodb ') <br/> drop database blobdatademodb <br/> go <br/> use master <br/> go <br/> Create Database blobdatademodb <br/> go <br/> -------- enable full-text search <br/>/********************* 3w@live.cn ********** * *****/<br/> execute sp_fulltext_database 'enable' <br/> go <br/> Use blobdatademodb <br/> go <br/> -- create an inclusion blob column table <br/>/*********************** 3w@live.cn ********* * ******/<br/> If object_id ('sampleblobtable ') is not null <br/> drop table sampleblobtable <br/> go <br/> Create Table sampleblobtable <br/> (<br/> [pkid] int identity (1, 1) primary Key, <br/> [filetype] nvarchar (32) null, <br/> [filename] nvarchar (255) null, <br/> [filecontent] varbinary (max) null, <br/> [addtime] datetime default (getdate () <br/> go <br/> If exists (select * From sys. objects where object_id = object_id (n' [DBO]. [cpp_insertoneblobdatatotable] ') and type in (n'p', n'pc') <br/> drop procedure [DBO]. [cpp_insertoneblobdatatotable] <br/> go <br/> -- creates a stored procedure for inserting data to SQL Server. <br/> /************ */<br/> Create procedure cpp_insertoneblobdatatotable <br/> (@ filetype nvarchar (32 ), <br/> @ filename nvarchar (255), <br/> @ filecontent varbinary (max) <br/>) <br/> as <br/> insert sampleblobtable ([filetype], [filename], [filecontent], [addtime]) <br/> values (@ filetype, @ filename, @ filecontent, getdate () <br/> go
Using system; <br/> using system. collections. generic; <br/> using system. LINQ; <br/> using system. text; <br/> using system. io; <br/> using system. data. sqlclient; <br/> using system. data; <br/> namespace blobdatasearchdemo <br/>{< br/> class Program <br/>{< br/> const string conn = @ "Server = workshop/agronet09; database = blobdatademodb; uid = sa; Pwd = As; "; <br/> static void main (string [] ARGs) <br/>{< br/> savedoc2sqlserver (@ "D:/2008 data/streamdata/doc/dancing. Doc", Conn); <br/> savedoc2sqlserver (@ "D: /2008 data/streamdata/doc/tianlong Babu .doc ", Conn); <br/> savedoc2sqlserver (@" D:/2008 data/streamdata/doc/english.doc ", Conn ); <br/> console. readkey (); <br/>}< br/> Private Static void savedoc2sqlserver (string filepath, string conn) <br/>{< br/> fileinfo Fi = new fileinfo (filepath); <br/> If (Fi. exists) <br/>{< br/> // open the stream and read it back. <br/> using (filestream FS = file. openread (filepath) <br/>{< br/> byte [] B = new byte [fi. length]; <br/> sqlconnection conn; <br/> sqlcommand cmduploaddoc; <br/> utf8encoding temp = new utf8encoding (true); <br/> while (FS. read (B, 0, B. length)> 0) <br/>{< br/> conn = new sqlconnection (conn ); <br/> // setting the sqlcommand <br/> cmduploaddoc = new sqlcommand ("cpp_insertoneblobdatatotable", Conn); <br/> cmduploaddoc. commandtype = commandtype. storedprocedure; <br/> cmduploaddoc. parameters. add ("@ FILENAME", sqldbtype. nvarchar, 200 ). value = Fi. name; <br/> cmduploaddoc. parameters. add ("@ filecontent", sqldbtype. varbinary, 0 ). value = B; <br/> cmduploaddoc. parameters. add ("@ filetype", sqldbtype. nvarchar, 32 ). value = Fi. extension. replace (". "," "); <br/> Conn. open (); <br/> cmduploaddoc. executenonquery (); <br/> Conn. close (); <br/>}< br/>
Query results:
Note:
1. When you set full-text search for a database, one field must be of the document type. the SQL Server full-text search starts the corresponding Doc engine for retrieval based on this document type.
2. You must set the language for full-text search. The Chinese version is 2052, and the English version is 1033.
Advantages:Import the DOC file into the SQL Server database for easy reading and full-text retrieval. If necessary, the file can also be written.
Disadvantages:Varbinary (max) is limited by the size of 2 GB, and database storage of a large amount of BLOB data will become abnormally bloated, and the retrieval speed will be greatly reduced.
3. Full-text search with filestream
Solution Abstract: similar to solution 2, only the filestream technology is used to store the DOC file in a physical file outside the database in the data format varbinary (max), and then perform full-text search on the table.
In SQL Server 2008
-Filestream data does not support remote storage.
-Database snapshots and database images are not supported.
-Some Katmai functions do not support filestream, for example:
• SQL Encryption
• Table value Parameters
Prerequisites: You must install full-text retrieval and enable filestream.
Refer:
Http://msdn.microsoft.com/zh-cn/library/bb933993.aspx
Http://www.cnblogs.com/downmoon/archive/2010/05/06/1727546.html
Http://www.cnblogs.com/downmoon/archive/2010/05/08/1730044.html
Advantages:Importing the DOC file into the SQL Server database allows you to easily read and retrieve the full text. You can also write the file if necessary, and overcome the disadvantages of solution 2. The varbinary (max) field only stores indexes, while the actual content is stored outside the database. The size is limited only by the physical size of the NTFS folder.
Summary: This article briefly summarizes how to combine the full-text retrieval technology of SQL Server to search the content of Word files. I think both solution 1 and solution 3 can be implemented. Welcome to the discussion. 3w@live.cn
Invitation month Note: The copyright of this article is jointly owned by the invitation month and csdn. For more information, see the source.
Helping others is the same as self-help! 3w@live.cn