First, Background introduction
Word documents are inseparable from everyday work, and in practical applications, when there are many Word documents in a document server, it becomes difficult to find a document that opens with some of the specified keywords, if there are thousands of documents, and there is no good solution at this point. My personal solution is to use the server-side Apache POI technology to store all of the document's text in the database and then open the document with an SQL statement to retrieve whether the document contains keywords to determine if the document is open. However, this solution has a lot of drawbacks, first of all, POI technology is not very good for Word document support, which supports Word's interface is single and not very stable, the format of the Word document is also very high. Second, if thousands of documents are used to store their text content in the database using POI, this will affect the performance of the server to a great extent. Pageoffice provides a property interface to get the full text content of a Word document, save the Plain text content of the entire Word file to the database, and use the database SQL statement to retrieve whether the document contains keywords to achieve this requirement.
Second, the main implementation of the Code
Save the file while getting a plain text file in the Word document and save to the database, because each time you save the Word file, the database is synchronized with the text content in the Word file, so you only need to make a SQL query of the text content in the database for full-text search of all Word files on the server.
Filesaver fs = new filesaver (request, response); String strdocumenttext //fs.savetofile (Request.getsession (). Getservletcontext (). Getrealpath ("doc/") + "/" + Fs.getfilename ()); Fs. Close ();
Third, the advantages of this program
Pageoffice the full-text document is performed by the client, which greatly reduces the pressure on the server and increases the responsiveness of the server to client requests.
A solution for full-text retrieval of all Word files on the server-java