These days in learning Java IO stream things, there is a netizen to see the blog after asked a question, that is to say that his doc documents why I say the method is garbled.
At first I thought it was his method problem, the result of himself after trying to find and his results are also garbled.
So after searching on the internet for a while to find that the original Doc document and Excel can not use the normal IO stream method to read, but also need to use POI, so after a try, finally in the correct encoding format to read the doc file.
In the online search process found that the doc and docx read method is not the same, so incidentally also learned a simple read of the docx file.
first, the Guide package:
Doc file reads, you need to import the Poi-scratchpad jar package and the associated dependency package:
Docx file reads, you need to import Poi-ooxml jar packages and related dependency packages:
I use the MAVEN build project, the dependent packages are automatically imported, and the MAVEN guide package is configured as follows:
<dependency> <groupId>Org.apache.poi</groupId> <artifactid>Poi-ooxml</artifactid> <version>3.8</version> </Dependency> <dependency> <groupId>Org.apache.poi</groupId> <artifactid>Poi-scratchpad</artifactid> <version>3.8</version> </Dependency>
second, the code to read the file:
1. doc file Read Simple example:
Public Static void ReadAndWriterTest3() throws IOException {File File =NewFile ("C:\\users\\tuzongxun123\\desktop\\aa.doc"); String str ="";Try{FileInputStream FIS =NewFileInputStream (file); Hwpfdocument doc =NewHwpfdocument (FIS); String Doc1 = Doc.getdocumenttext (); System. out. println (Doc1); StringBuilder doc2 = Doc.gettext (); System. out. println (DOC2); Range rang = Doc.getrange (); String doc3 = Rang.text (); System. out. println (DOC3); Fis.close (); }Catch(Exception e) {E.printstacktrace (); } }
2. docx file Read Simple example:
Public Static void ReadAndWriterTest4() throws IOException {File File =NewFile ("C:\\users\\tuzongxun123\\desktop\\aa.docx"); String str ="";Try{FileInputStream FIS =NewFileInputStream (file); Xwpfdocument Xdoc =NewXwpfdocument (FIS); Xwpfwordextractor Extractor =NewXwpfwordextractor (Xdoc); String Doc1 = Extractor.gettext (); System. out. println (Doc1); Fis.close (); }Catch(Exception e) {E.printstacktrace (); } }
Java reads doc and docx files using POI