How to manipulate Word, Excel, PDF files in Java

Source: Internet
Author: User

reading data for Office and PDF documents is a common feature in a normal application, especially in many Web applications. So today we'll take a quick look at Java's reading of Word, Excel, and PDF files. This blog only explains simple applications. If you want to get a deeper understanding of the principle. Readers should study some of the relevant source code.

First, let's get to know the jar package that reads the relevant document:

1. Reference POI package Read Word document content

Poi.jar

Http://apache.freelamp.com/poi/release/bin/poi-bin-3.6-20091214.zip http://apache.etoak.com/poi/release/bin/ Poi-bin-3.6-20091214.zip Http://labs.renren.com/apache-mirror/poi/release/bin/poi-bin-3.6-20091214.zip

2. Refer to the JXL package to read the contents of an Excel document

Jxl.jar

Http://nchc.dl.sourceforge.net/project/jexcelapi/CSharpJExcel/CSharpJExcel.zip

3. Reference PDFBox read the contents of a PDF document

Pdfbox.jar

Http://labs.renren.com/apache-mirror/pdfbox/1.1.0/pdfbox-1.1.0.jar http://apache.etoak.com/pdfbox/1.1.0/ Pdfbox-1.1.0.jar Http://apache.freelamp.com/pdfbox/1.1.0/pdfbox-1.1.0.jar

Fontbox.jar

Http://apache.etoak.com/pdfbox/1.1.0/fontbox-1.1.0.jar http://labs.renren.com/apache-mirror/pdfbox/1.1.0/ Fontbox-1.1.0.jar Http://apache.freelamp.com/pdfbox/1.1.0/fontbox-1.1.0.jar

Jempbox.jar

Http://labs.renren.com/apache-mirror/pdfbox/1.1.0/jempbox-1.1.0.jar http://apache.etoak.com/pdfbox/1.1.0/ Jempbox-1.1.0.jar Http://apache.freelamp.com/pdfbox/1.1.0/jempbox-1.1.0.jar

Let's take a quick look at the application examples of these jar packages that read the document:

1. Reference POI package Read Word document content

  1. Import Java.io.File;
  2. Import Java.io.FileInputStream;
  3. Import org.apache.poi.hwpf.HWPFDocument;
  4. Import Org.apache.poi.hwpf.usermodel.Range;
  5. /**
  6. *
  7. * @author Cao Shenhuan
  8. */
  9. Public class docfile{
  10. @Override
  11. Public String getcontent (File f) throws Exception {
  12. FileInputStream FIS = new FileInputStream (f);
  13. Hwpfdocument doc = new hwpfdocument (FIS);
  14. Range rang = Doc.getrange ();
  15. String text = Rang.text ();
  16. Fis.close ();
  17. return text;
  18. }
Import Java.io.File;   Import Java.io.FileInputStream;   Import org.apache.poi.hwpf.HWPFDocument;   Import Org.apache.poi.hwpf.usermodel.Range;   /** * * @author Cao Sheng Huan * * public class docfile{       @Override public     String getcontent (file f) throws Exception {           file InputStream fis = new FileInputStream (f);           Hwpfdocument doc = new hwpfdocument (FIS);           Range rang = Doc.getrange ();           String text = Rang.text ();           Fis.close ();           return text;       }   

2. Refer to the JXL package to read the contents of an Excel document

  1. Import Java.io.File;
  2. Import Java.io.FileInputStream;
  3. Import JXL. Cell;
  4. Import JXL. Sheet;
  5. Import JXL. Workbook;
  6. /**
  7. *
  8. * @author Cao Shenhuan
  9. */
  10. Public class xlsfile{
  11. @Override
  12. Public String getcontent (File f) throws Exception {
  13. //Build Workbook object, read-only Workbook object
  14. //Create workbook directly from a local file
  15. //Create workbook from input stream
  16. FileInputStream FIS = new FileInputStream (f);
  17. StringBuilder sb = new StringBuilder ();
  18. Jxl. Workbook RWB = Workbook.getworkbook (FIS);
  19. Once the workbook is created, we can access it through
  20. An array collection of Excel sheet (terms: worksheets),
  21. You can also call the Getsheet method to get the specified payroll table
  22. sheet[] Sheet = Rwb.getsheets ();
  23. For (int i = 0; i < sheet.length; i++) {
  24. Sheet rs = Rwb.getsheet (i);
  25. For (int j = 0; j < Rs.getrows (); j + +) {
  26. Cell[] cells = Rs.getrow (j);
  27. For (int k=0;k<cells.length;k++)
  28. Sb.append (Cells[k].getcontents ());
  29. }
  30. }
  31. Fis.close ();
  32. return sb.tostring ();
  33. }
  34. }
Import Java.io.File;   Import Java.io.FileInputStream; Import JXL.   Cell; Import JXL.   Sheet; Import JXL.   Workbook;        /** * * @author Cao Sheng Huan * * public class xlsfile{@Override public String getcontent (File f) throws Exception { Build Workbook object, read-only Workbook object//Create workbook directly from local file//create workbook from input stream FileInputStream FIS = new FILEINPUTST           Ream (f);           StringBuilder sb = new StringBuilder (); Jxl.   Workbook RWB = Workbook.getworkbook (FIS);           Once workbook is created, we can access the array of//excel Sheet (term: Worksheet),//You can also call the Getsheet method to get the specified payroll sheet[] Sheet = Rwb.getsheets ();               for (int i = 0; i < sheet.length; i++) {Sheet rs = rwb.getsheet (i);                  for (int j = 0; J < Rs.getrows (); j + +) {cell[] cells = Rs.getrow (j);               for (int k=0;k<cells.length;k++) sb.append (Cells[k].getcontents ());           }} fis.close ();       return sb.tostring (); }   }

From the above code we know: once we get sheet, we can access the Excel cell (the term: cell). Refer to the following code snippet://Get the first row, the value of the first column, Cell c00 = Rs.getcell (0, 0); String strc00 = c00.getcontents (); Gets the value of the first row, the second column, Cell C10 = Rs.getcell (1, 0); String STRC10 = c10.getcontents (); When you have finished working with Excel spreadsheet data, be sure to use the close () method to close the previously created object to free up memory space used in reading the data table, which is especially important when reading large amounts of data.

3. Reference PDFBox read the contents of a PDF document

  1. Import Java.io.File;
  2. Import Java.io.FileInputStream;
  3. Import Org.pdfbox.pdfparser.PDFParser;
  4. Import org.pdfbox.pdmodel.PDDocument;
  5. Import Org.pdfbox.util.PDFTextStripper;
  6. /**
  7. *
  8. * @author Cao Shenhuan
  9. */
  10. <pre class="java" name="code" >Public class pdffile{
  11. Public String getcontent (File f) throws Exception {
  12. FileInputStream FIS = new FileInputStream (f);
  13. Pdfparser p = new Pdfparser (FIS);
  14. P.parse ();
  15. PDDocument PDD = P.getpddocument ();
  16. Pdftextstripper ts = new Pdftextstripper ();
  17. String C = Ts.gettext (PDD);
  18. Pdd.close ();
  19. Fis.close ();
  20. return C;
  21. }
  22. }</pre><br>

=========== for share only =========

How to manipulate Word, Excel, PDF files in Java

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.