How to install the indexing service in a simplified XP system:
Go to the Internet to find the original system image that is consistent with the existing system, directly run the image to install the system, and select "Install update.
The actual effect of "Install update" is to retain existing programs and automatically install missing system components or services (so it is okay to use tools to process extra services and startup items, after installation, the system patches will not have to be installed again ).
The reason why you want to install the indexing service is that ifilter has recently been used in. Net to read the text of the Word file.
For details, see the open-source project using-ifilter-in-C #.
This method does not require the installation of office to read the text of the word. It extracts Chinese characters perfectly, and the reading speed is fast, even if the word contains tables and images.
Files such as Excel, PPT, and TXT are also supported. (Test environment: XP Professional SP3, Server 2003 enterprise SP1, Server 2008 Enterprise SP2)
In actual tests, It is also found that the Indexing Service does not affect the function even if it is not started. However, the system must have at least the Indexing Service (the system after the XP system corresponds to the Windows Search Service, see the msdn documentation ).
Someone in the garden has written an article using ifilter to extract word text before, but his program does not support Chinese Punctuation. At present, many people have discussed and tested it :. how to extract text from a document file (such as Word or PDF) under. net
I used this text extraction method to implement the COM component that does not use Word (for the method of using COM component, see Microsoft. office. interOP. word method), that is, do not install word to count the word count.
Regular Expression: RegEx. Matches (temstr ,@"(? I) [A-Z _ '0-9-] + "). count + RegEx. matches (temstr, @ "[\ u0391-\ uffe5]"). count + RegEx. matches (temstr ,@"(? I) [^ A-Z _ '0-9 \ u0391-\ uffe5-] + "). count; // Number of words + number of Chinese characters (including Chinese Punctuation) + others. Refer to csdn-related posts for regular expressions.