reading data for Office and PDF documents is a common feature in a normal application, especially in many Web applications. So today we'll take a quick look at Java's reading of Word, Excel, and PDF files. This blog only explains simple applications. If you want to get a deeper understanding of the principle. Readers should study some of the relevant source code.
First, let's get to know the jar package that reads the relevant document:
1. Reference POI package Read Word document content
Poi.jar
Http://apache.freelamp.com/poi/release/bin/poi-bin-3.6-20091214.zip http://apache.etoak.com/poi/release/bin/ Poi-bin-3.6-20091214.zip Http://labs.renren.com/apache-mirror/poi/release/bin/poi-bin-3.6-20091214.zip
2. Refer to the JXL package to read the contents of an Excel document
Jxl.jar
Http://nchc.dl.sourceforge.net/project/jexcelapi/CSharpJExcel/CSharpJExcel.zip
3. Reference PDFBox read the contents of a PDF document
Pdfbox.jar
Http://labs.renren.com/apache-mirror/pdfbox/1.1.0/pdfbox-1.1.0.jar http://apache.etoak.com/pdfbox/1.1.0/ Pdfbox-1.1.0.jar Http://apache.freelamp.com/pdfbox/1.1.0/pdfbox-1.1.0.jar
Fontbox.jar
Http://apache.etoak.com/pdfbox/1.1.0/fontbox-1.1.0.jar http://labs.renren.com/apache-mirror/pdfbox/1.1.0/ Fontbox-1.1.0.jar Http://apache.freelamp.com/pdfbox/1.1.0/fontbox-1.1.0.jar
Jempbox.jar
Http://labs.renren.com/apache-mirror/pdfbox/1.1.0/jempbox-1.1.0.jar http://apache.etoak.com/pdfbox/1.1.0/ Jempbox-1.1.0.jar Http://apache.freelamp.com/pdfbox/1.1.0/jempbox-1.1.0.jar
Let's take a quick look at the application examples of these jar packages that read the document:
1. Reference POI package Read Word document content
- Import Java.io.File;
- Import Java.io.FileInputStream;
- Import org.apache.poi.hwpf.HWPFDocument;
- Import Org.apache.poi.hwpf.usermodel.Range;
- /**
- *
- * @author Cao Shenhuan
- */
- Public class docfile{
- @Override
- Public String getcontent (File f) throws Exception {
- FileInputStream FIS = new FileInputStream (f);
- Hwpfdocument doc = new hwpfdocument (FIS);
- Range rang = Doc.getrange ();
- String text = Rang.text ();
- Fis.close ();
- return text;
- }
Import Java.io.File; Import Java.io.FileInputStream; Import org.apache.poi.hwpf.HWPFDocument; Import Org.apache.poi.hwpf.usermodel.Range; /** * * @author Cao Sheng Huan * * public class docfile{ @Override public String getcontent (file f) throws Exception { file InputStream fis = new FileInputStream (f); Hwpfdocument doc = new hwpfdocument (FIS); Range rang = Doc.getrange (); String text = Rang.text (); Fis.close (); return text; }
2. Refer to the JXL package to read the contents of an Excel document
- Import Java.io.File;
- Import Java.io.FileInputStream;
- Import JXL. Cell;
- Import JXL. Sheet;
- Import JXL. Workbook;
- /**
- *
- * @author Cao Shenhuan
- */
- Public class xlsfile{
- @Override
- Public String getcontent (File f) throws Exception {
- //Build Workbook object, read-only Workbook object
- //Create workbook directly from a local file
- //Create workbook from input stream
- FileInputStream FIS = new FileInputStream (f);
- StringBuilder sb = new StringBuilder ();
- Jxl. Workbook RWB = Workbook.getworkbook (FIS);
- Once the workbook is created, we can access it through
- An array collection of Excel sheet (terms: worksheets),
- You can also call the Getsheet method to get the specified payroll table
- sheet[] Sheet = Rwb.getsheets ();
- For (int i = 0; i < sheet.length; i++) {
- Sheet rs = Rwb.getsheet (i);
- For (int j = 0; j < Rs.getrows (); j + +) {
- Cell[] cells = Rs.getrow (j);
- For (int k=0;k<cells.length;k++)
- Sb.append (Cells[k].getcontents ());
- }
- }
- Fis.close ();
- return sb.tostring ();
- }
- }
Import Java.io.File; Import Java.io.FileInputStream; Import JXL. Cell; Import JXL. Sheet; Import JXL. Workbook; /** * * @author Cao Sheng Huan * * public class xlsfile{@Override public String getcontent (File f) throws Exception { Build Workbook object, read-only Workbook object//Create workbook directly from local file//create workbook from input stream FileInputStream FIS = new FILEINPUTST Ream (f); StringBuilder sb = new StringBuilder (); Jxl. Workbook RWB = Workbook.getworkbook (FIS); Once workbook is created, we can access the array of//excel Sheet (term: Worksheet),//You can also call the Getsheet method to get the specified payroll sheet[] Sheet = Rwb.getsheets (); for (int i = 0; i < sheet.length; i++) {Sheet rs = rwb.getsheet (i); for (int j = 0; J < Rs.getrows (); j + +) {cell[] cells = Rs.getrow (j); for (int k=0;k<cells.length;k++) sb.append (Cells[k].getcontents ()); }} fis.close (); return sb.tostring (); } }
From the above code we know: once we get sheet, we can access the Excel cell (the term: cell). Refer to the following code snippet://Get the first row, the value of the first column, Cell c00 = Rs.getcell (0, 0); String strc00 = c00.getcontents (); Gets the value of the first row, the second column, Cell C10 = Rs.getcell (1, 0); String STRC10 = c10.getcontents (); When you have finished working with Excel spreadsheet data, be sure to use the close () method to close the previously created object to free up memory space used in reading the data table, which is especially important when reading large amounts of data.
3. Reference PDFBox read the contents of a PDF document
- Import Java.io.File;
- Import Java.io.FileInputStream;
- Import Org.pdfbox.pdfparser.PDFParser;
- Import org.pdfbox.pdmodel.PDDocument;
- Import Org.pdfbox.util.PDFTextStripper;
- /**
- *
- * @author Cao Shenhuan
- */
- <pre class="java" name="code" >Public class pdffile{
- Public String getcontent (File f) throws Exception {
- FileInputStream FIS = new FileInputStream (f);
- Pdfparser p = new Pdfparser (FIS);
- P.parse ();
- PDDocument PDD = P.getpddocument ();
- Pdftextstripper ts = new Pdftextstripper ();
- String C = Ts.gettext (PDD);
- Pdd.close ();
- Fis.close ();
- return C;
- }
- }</pre><br>
=========== for share only =========
How to manipulate Word, Excel, PDF files in Java