Use the Apache POI and OpenOffice APIs to count the pages of Office documents in Linux

Source: Internet
Author: User
Tags ole linux

Introduction to Apache POI

Apache POI is a set of Java APIs for accessing Microsoft Office format documents (Word, Excel, and PowerPoint). The API used to manipulate Excel format files is HSSF, and the API for manipulating Word format files is HWPF and the API for manipulating PowerPoint format files is HSLF.

POI's official website is http://poi.apache.org, the user can download the latest version 3.6 from here first, download after decompression has three jar Packages (Poi-3.6-20091214.jar,poi-contrib-3.6-20091214.jar and Poi-scratchpad-3.6-20091214.jar) copy the three jar packages to Lib of the Eclipse project directory, and then refresh the project to load the POI class library.

POI Main components

Poifs:poifs is the oldest and most stable part of the project, which supports both read and write capabilities, and all components ultimately depend on its definition.

Poifs for OLE 2 file operations: The foundation of Poifs is part of the oldest and most stable project. This is the pure Java implementation of our OLE 2 compound document format. It also supports both reading and writing functions. All of our components ultimately depend on its definition. For more information, see the POIFS project page.

HSSF for Excel file operations: HSSF is a pure Java implementation for document operations in Microsoft Excel 97 (-2003) file Format (BIFF8). It supports reading and writing skills. For more information, see the HSSF project page.

HWPF for Word file operations: HWPF is a pure Java interface for document operations in Microsoft Word 97 file format. This component is in the early stages of development and has limited ability to read and write Word documents, and simply reads and writes simple Word files For more information HWPF see the HWPF project page.

HSLF for PowerPoint file operations: HSLF is a pure Java interface for document operations in the Microsoft PowerPoint 97 (-2003) file format. It supports reading and writing skills. For more information, see the HSLF project page.

HDGF for Visio file operations: In addition POI provides a pure Java interface for HDGF document operations in the Microsoft Visio97 (-2003) file format. It currently only supports read operations, at a very low level, and only supports simple text extraction. For more information, see the HDGF project page.

HPSF Document properties: HPSF is the pure Java interface for OLE 2 formatting. A property set that is used primarily to store files (such as headings, authors, last modified dates, and so on) that they can use for a particular application purpose. For more information, see the HPSF project page.

Here's a quick introduction to the interfaces that are often used in projects to manipulate Excel and Word format files:

HSSF interface

At present POI more mature part is the HSSF interface, processing MS Excel (97-2003) object. It's not like we're just using CSV to generate unformatted things that can be converted by Excel, but real Excel objects, you can control some properties like Cell,sheet and so on. Of course, HSSF also has some drawbacks, such as not directly support the Excel chart, package and package dependencies are more complex and so on.

For the number of statistics pages (sheet), the HSSF interface can easily complete this function. Here's a brief introduction to the HSSF interface:

The objects that HSSF provides to us are in the Org.apache.poi.hssf.usermodel package, and the main parts include Excel objects, styles and formatting, and auxiliary operations. There are mainly the following types of objects:

Hssfworkbook: Document object corresponding to Excel

Hssfsheet: Forms that correspond to Excel

Hssfrow: Rows corresponding to Excel

Hssfcell: Grid unit corresponding to Excel

Hssffont: corresponding to Excel fonts

Hssfname: corresponding to Excel name

Hssfdataformat: corresponds to date format

Hssfheader: corresponding to Sheet head

Hssffooter: Corresponds to Sheet tail

Hssfcellstyle: Corresponds to the Cell style

Auxiliary operations include:

Hssfdateutil Date

Hssfprintsetup Printing

Hssferrorconstants Error Information table

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.