In a previous C # project, a requirement was to extract the content of Word documents by page. Later, this requirement was unavailable, but some methods were found out from the middle. Currently, this program can be basically used to read .doc).docx and word files in the format. It is very unlikely that special files cannot be read.
To operate Word documents, you must introduce Microsoft. office. interop. word. dll, which can be found directly when reference is added in vs2010. The version used in this program is 14.0.0.0 ..
An error will be reported when you use it directly:
Change the "Interoperability type" attribute of the dll file to "false" to solve the error.
The program code is as follows:
getWordContentByPage( filepath, FileInfo f = (! file_name = file_path = pageCount = Microsoft.Office.Interop.Word.Document doc = Microsoft.Office.Interop.Word.ApplicationClass app = missing = FileName = readOnly = isVisible = doc = app.Documents.Open( FileName, missing, missing, missing, missing, missing, missing, missing, missing, isVisible, missing, missing, Microsoft.Office.Interop.Word.WdStatistic stat = pageCount = doc.ComputeStatistics(stat, missing); What = Which = page = pageNum + ; Microsoft.Office.Interop.Word.Range ran1 = doc.GoTo( What, Which, page, Microsoft.Office.Interop.Word.Range ran2 = objStart = ran2.End; objEnd = ran1.Start; (page.Equals( + objStart = objEnd = Microsoft.Office.Interop.Word.Range r3 = doc.Range( objStart, String content = r3.Text; saveOption = doc.Close( saveOption, missing, app.Quit( saveOption, missing, }