Java reads doc and docx files using POI

Source: Internet
Author: User

These days in learning Java IO stream things, there is a netizen to see the blog after asked a question, that is to say that his doc documents why I say the method is garbled.

At first I thought it was his method problem, the result of himself after trying to find and his results are also garbled.

So after searching on the internet for a while to find that the original Doc document and Excel can not use the normal IO stream method to read, but also need to use POI, so after a try, finally in the correct encoding format to read the doc file.

In the online search process found that the doc and docx read method is not the same, so incidentally also learned a simple read of the docx file.

first, the Guide package:
Doc file reads, you need to import the Poi-scratchpad jar package and the associated dependency package:

Docx file reads, you need to import Poi-ooxml jar packages and related dependency packages:

I use the MAVEN build project, the dependent packages are automatically imported, and the MAVEN guide package is configured as follows:

 <dependency>        <groupId>Org.apache.poi</groupId>        <artifactid>Poi-ooxml</artifactid>        <version>3.8</version>    </Dependency>    <dependency>        <groupId>Org.apache.poi</groupId>        <artifactid>Poi-scratchpad</artifactid>        <version>3.8</version>    </Dependency>

second, the code to read the file:
1. doc file Read Simple example:

 Public Static void ReadAndWriterTest3() throws IOException {File File =NewFile ("C:\\users\\tuzongxun123\\desktop\\aa.doc"); String str ="";Try{FileInputStream FIS =NewFileInputStream (file); Hwpfdocument doc =NewHwpfdocument (FIS);            String Doc1 = Doc.getdocumenttext (); System. out. println (Doc1);            StringBuilder doc2 = Doc.gettext (); System. out. println (DOC2);            Range rang = Doc.getrange ();            String doc3 = Rang.text (); System. out. println (DOC3);        Fis.close (); }Catch(Exception e)        {E.printstacktrace (); }    }

2. docx file Read Simple example:

 Public Static void ReadAndWriterTest4() throws IOException {File File =NewFile ("C:\\users\\tuzongxun123\\desktop\\aa.docx"); String str ="";Try{FileInputStream FIS =NewFileInputStream (file); Xwpfdocument Xdoc =NewXwpfdocument (FIS); Xwpfwordextractor Extractor =NewXwpfwordextractor (Xdoc);            String Doc1 = Extractor.gettext (); System. out. println (Doc1);        Fis.close (); }Catch(Exception e)        {E.printstacktrace (); }    }

Java reads doc and docx files using POI

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.