Poi Excel reading tutorial

Source: Internet
Author: User
Tags getstream
I. Basics of Excel

Ii. Overview of hssf

3. Reading files through usermodel

4. Writing files through usermodel

5. Reading files through eventusermodel

Vi. hssf workbook Structure

7. Reading document attributes through hpsf

VIII. Document Summary

IX. Appendix

There are already too many other

Body:

There are already too many other

In the previous article, we introduced the basic concepts of the poi project, learned how to use poi to read and write the structure of the OLE 2 composite document, and provided two simple examples: use poi to read and write the workbook stream of an Excel file. This article continues with the previous topic and explains how to use poi to read/write a complete Excel file.

Convention: the 2.0 version of POI project is nearing the official release stage, with rapid development progress. New functions are constantly integrated into the original system, and modifications to the original system are also made. To ensure the timeliness of this article, this article will follow the instructions of the latest 1.9 development version. Although the source code of the latest release can run normally, the current Code may differ from the 2.0 release.

I. Basics of Excel

Microsoft Excel 97 file format is also known as biff8. in the latest version of Excel, only a few changes are made to this format. In addition to increasing project complexity, the added support for the new format may only have to upgrade the code for each user, without any practical benefits. Therefore, in the description below, all the points that mention the Excel 97 format actually refer to the EXCEL from 97 to XP format.

Ii. Overview of hssf

The Excel 97 file format implemented by the poi project is hssf -- as you may have guessed, hssf is the abbreviation of horrible spreadsheet format, that is, the "annoying spreadsheet format" (Microsoft has made some simple things overly complicated and handled some simple things that originally needed flexibility, which is admirable !) Perhaps the hssf name is a bit funny. In essence, it is a very serious and formal API. With hssf, you can use pure Java code to read, write, and modify Excel files.

The previous article mentioned poifs. What is the relationship between hssf and poifs? Like other poi APIs, hssf is built on poifs. Therefore, some codes in hssf are similar to some previous codes. However, when writing code based on hssf APIs, we generally do not need to understand the details of poifs APIs.

Hssf provides two types of APIS for read operations: usermodel and eventusermodel, namely, "user model" and "Event-user model ". The former is easy to understand, while the latter is abstract, but the operation efficiency is much higher. Usermodel mainly includes Org. apache. poi. hssf. usermodel and org. apache. poi. hssf. eventusermodel package implementation (in earlier versions of hssf, org. apache. poi. hssf. eventusermodel belongs to the eventmodel package ).

The usermodel package maps Excel files into familiar structures such as workbook, sheet, row, and cell. It stores the entire structure in the memory as a group of objects. Eventusermodel requires that you be familiar with the underlying structure of the file format. Its operation style is similar to the sax API of XML and the event model of AWT (this is the origin of eventusermodel name. In addition, the eventusermodel API only provides the File Reading function, that is, you cannot use this API to modify files.

3. Reading files through usermodel

It is easy to read files using the usermodel of hssf. First, create an inputstream and then create an hssfworkbook:

Inputstream myxls = new fileinputstream ("workbook.xls "));
Hssfworkbook WB = new hssfworkbook (myxls );

With the hssfworkbook instance, you can extract the rows and columns of the worksheet and worksheet, for example:

Hssfsheet sheet = WB. getsheetat (0); // The first Worksheet
Hssfrow ROW = sheet. getrow (2); // the third row
Hssfcell cell = row. getcell (short) 3); // The fourth cell.

The code above extracts the fourth cell of the third row of the first worksheet. The cell object can be used to obtain its value. Pay attention to its type when extracting the cell value:

If (cell. getcelltype () = hssfcell. cell_type_string ){
("Cell is a string and the value is:" + cell. getstringcellvalue ());
} Else if (cell. getcelltype () = hssfcell. cell_type_numeric ){
("Cell is a number and the value is:" + cell. getcellvalue ());
} Else (){
("The cell value is not a string or a value. ");
}

If the data type is incorrect, the program encounters an exception. In particular, be careful when using hssf to process date data. In Excel, date data is saved as numerical values. The only way to distinguish date data is through the cell format (if you have set the date format in Excel, you should understand what this means ).

Therefore, cell. getcelltype () returns hssfcell. cell_type_numeric for cells that contain date data. However, the tool function hssfdateutil. iscelldateformatted (cell) can be used to determine whether the cell value is a date. The iscelldateformatted function draws a conclusion by comparing the date of a cell and the built-in Date Format of Excel-as you can imagine, according to this judgment method, the iscelldateformatted function returns a negative conclusion in many cases, and there is a possibility of misjudgment.

The appendix of this article contains an instance that uses hssf to create and return Excel workbooks In the servlet environment.

4. Writing files through usermodel

Writing an xls file is easier than reading an xls file. Create an hssfworkbook instance, and then create an outputstream that writes files to the disk when appropriate. However, you can create an outputstream when processing ends:

Hssfworkbook WB = new hssfworkbook ();
Fileoutputstream fileout
= New fileoutputstream ("workbook.xls ");
WB. Write (fileout );
Fileout. Close ();

To create a worksheet and its content, you must start with the corresponding parent object, for example:

Hssfsheet sheet = WB. createsheet ();
Hssfrow ROW = sheet. createrow (short) 0 );
Hssfcell cell = row. createcell (short) 0 );
Cell. setcellvalue (1 );
Row. createcell (short) 1). setcellvalue (1.2 );
Row. createcell (short) 2). setcellvalue ("A string ");
Row. createcell (short) 3). setcellvalue (true );

To set a cell style, you must first create a style object and then specify it to a cell-or assign it to multiple cells with the same style, for example, if an Excel table contains a summary row, the data in the summary row must be in bold or italic. You can create a summaryrowstyle style object and specify this style to all cells in the summary row.

Note that cellformat and cellstyle objects are members of workbook objects, and cell objects only reference them.

...
Hssfcellstyle style = Workbook. createcellstyle ();
Style. setdataformat
(Hssfdataformat. getbuiltinformat ("($ #, #0 _); [Red] ($ #, #0 )"));
Style. setfillbackgroundcolor (hssfcolor. Aqua. Index );
Style. setfillpattern (hssfcellstyle. big_spots );
...
Somecell. setcellstyle (style );
Someothercell. setcellstyle (style );

A newer version of hssf allows a limited number of Excel formulas. This function is still "beta-level quality" and must be carefully tested before it is officially used. The method class that specifies the formula, for example, somecell. setcellformula (sum (A1: A2 :);.

Currently, all built-in functions or operators can be called in formulas, except logical operators and functions (such as if functions). These functions are still under development.

5. Reading files through eventusermodel

Using eventusermodel to read files is much more complex than using usermodel, but the efficiency is much higher, because it requires the application to read data while processing data. Eventusermodel actually simulates the method of processing XML documents by using sax in the DOM environment. The application first needs to register the data to be processed. eventusermodel calls back the method of application registration when a matching data structure is encountered. The biggest difficulty with using eventusermodel is that you must be familiar with the internal structure of the Excel Workbook.

In hssf, a low-level binary structure is called a record ). Records have different types. Each type is described by a Java class in the org. Apache. Poi. hssf. Record package. For example, a bofrecord record indicates the beginning of a workbook or sheet region, and a rowrecord indicates that a row exists and its style information is saved. All records with the cellvaluerecordinterface interface represent Excel cells, including numericrecord, labelsstrecord, and formularecord (there are others, some of which have been discarded and some are used for optimization, but in general, hssf can convert them ).

The following is an example of registering an event processing handle:

Private eventrecordfactory factory = new eventrecordfactory ();
Factory. registerlistener (New erflistener (){
Public Boolean processrecord (record REC ){
(Got BOF record );
Return true;
}
}, New short [] {bofrecord. Sid });
Factory. processrecords (someinputstream );

Vi. hssf workbook Structure

As mentioned above, hssf is based on poifs. Specifically, the Excel 97 + file is an ole 2 compound document. The underlying Ole 2 composite document stores a workbook that is always named as Workbook (except for Excel 95, hssf does not support Excel 95) streams. However, macros and images are not stored in the workbook stream. They have their own independent streams, and sometimes they are stored in another directory in the OLE 2 CDF file. In ideal cases, macros should also be retained, but currently there is no proper API in the poi project to process macros.

Each stream contains a set of records. A record is actually a byte array, which can be divided into a record header and a record body. The record header specifies the record type (that is, the ID) and the length of the subsequent data. The record body is divided into multiple fields, which contain numerical data (including references to other records), character data or mark.

The top-level structure of an Excel Workbook is described as follows:

Bla.xls {
Ole2cdf Headers
"Workbook" stream {
Workbook {
Static string table record ..
Sheet names... And pointers
}
Sheet {
Row
Row
...
Number record (cell)
Labelsst record (cell)
...
}
Sheet
}
}
... Images, macros, etc.
Document Summary
Summary

7. Reading document attributes through hpsf

In Microsoft Word, Excel, PowerPoint, and other software, you can add additional information to the document through the "file" → "properties" menu, including the title, topic, summary, category, and Keywords of the document. The application itself also includes the last accessed user, last accessed and modified/printed date and time.

The attributes and body of a document are saved separately. As mentioned above, the OLE 2 CDF file is like a container, which contains many directories and file structures, and poifs is a tool used to access the files. These files are also called streams, and their attributes are stored in a dedicated stream in the poifs file system. Take a Word document as an example: although you only see a document named myfile.doc in the resource manager, it actually contains a worddocument, A summaryinformation and a documentsummaryinformation document. There are usually other documents.

Can you guess what these documents (streams) contain? Yes, worddocument contains the text you edited in word. The attributes of the document are stored in the summaryinformation and documentsummaryinformation streams. It may seem too easy to save all attributes in a single document, so Microsoft is determined to use two streams to make things more complex, the names of the two streams are also preceded by the octal/005 characters-this is an unprintable character, so the former is omitted.

The standard attributes defined by Microsoft do not care about the type of the main document, whether it is a Word document, an Excel Workbook, or a PowerPoint. As long as you know how to read the attributes of an Excel document, you will know how to read the attributes of other documents.

Reading document attributes is not complex, because Java programs can use the hpsf package of the poi project. Hpsf is the abbreviation of horrible property set format. translating it into Chinese is "annoying property set format ". The hpsf package is a read attribute tool implemented by the poi project. Currently, attribute writing is not supported.

The APIS provided by hpsf can be used to read standard properties defined by Microsoft. However, to read any property set, a more general API is required, it is much more complicated than reading standard attributes. This article only describes simple APIs for reading standard attributes, because this is sufficient for most applications.

The following is a Java program that reads the title attribute of the OLE 2 CDF document:

Import java. Io .*;
Import org. Apache. Poi. hpsf .*;
Import org. Apache. Poi. poifs. eventfilesystem .*;

/**
* Example program for reading the title of the OLE 2 document,
* Specify the file name in the command line parameters.
*/

Public class readtitle
{
Public static void main (string [] ARGs) throws ioexception
{
Final string filename = ARGs [0];
Poifsreader r = new poifsreader ();
R. registerlistener (New mypoifsreaderlistener (),
"/005 summaryinformation ");
R. Read (New fileinputstream (filename ));
}

Static class mypoifsreaderlistener
Implements poifsreaderlistener
{
Public void processpoifsreaderevent (poifsreaderevent event)
{
Summaryinformation Si = NULL;
Try
{
SI = (summaryinformation)
Propertysetfactory. Create (event. getstream ());
}
Catch (exception ex)
{
Throw new runtimeexception
("Attribute set stream/" "+ event. getpath () +
Event. getname () + "/": "+ ex );
}

Final String title = Si. gettitle ();

If (title! = NULL)
System. Out. println ("title:/" "+ title + "/"");
Else
System. Out. println ("This document has no title .");
}
}
}

The main () method uses the poifs Event System to read the/005summaryinformation stream from the OLE 2 file specified by the command line. When poifsreader encounters this stream, it passes the control to the processpoifsreaderevent () method of mypoifsreaderlistener.
What is the use of processpoifsreaderevent? It obtains an input stream through parameters. The input stream contains attributes such as the document title. To access the document attributes, we create a propertyset instance from the input stream, as shown below:

SI = (summaryinformation) propertysetfactory. Create (event. getstream ());

This statement actually contains three steps:

◆ Event. getstream () gets the input stream from the poifsreaderevent passed in by poifsreader.

◆ Call the static method create () of propertysetfactory with the input stream as the parameter (). As its name implies, propertysetfactory is a factory class, which has a "machine" that can convert an input stream into a propertyset instance. This machine is the CREATE () method.

◆ Convert the propertyset stereotypes (CAST) returned by the CREATE () method into summaryinformation. Propertyset provides various mechanisms for reading attribute sets according to the general method. summaryinformation is a subclass of propertyset, that is, the summaryinformation class adds a convenient way to operate Microsoft standard Attributes Based on the propertyset class.

In this process, there are many factors that may cause errors, so we put this part into a try block-however, this sample program only handles exceptions in the simplest way, in practical applications, it is best to handle different possible exception types separately. In addition to general I/O exceptions, there may also be hpsf-specific exceptions. For example, if the input stream does not contain an attribute set or the attribute set is invalid, A nopropertysetstreamexception will be thrown.

One error is uncommon, but it is not impossible --/005summaryinformation contains a valid property set, but not a summary information property set. In this case, the summaryinformation operation fails, causing a classcastexception.

After obtaining the summaryinformation instance, the rest is simple. You only need to call the gettitle () method and then output the result.

In addition to gettitle (), summaryinformation also contains other convenient methods, such as getapplicationname (), getauthor (), getcharcount (), and getcreatedatetime. The javadoc document of hpsf details all these methods.

VIII. Document Summary

Unfortunately, not all attributes are saved in the abstract information attribute set. Many (but not all) Ole 2 files have another property set called "document summary information", and the corresponding stream is/005 documentsummaryinformation. The attributes saved in this property set include the document category, the number of multimedia clips of PowerPoint slides, and so on.

To access the document summary information attribute set, the process of the program is similar to the previous example, but the registration target should be changed to the/005documentsummaryinformation stream-sometimes, you may want to register both the summary information and the document summary information. The remaining processing methods are similar to the previous example. You should pass the spread containing the document summary information to propertysetfactory. create (), but this time the factory method will return a documentsummaryinformation object (instead of the summaryinformation object in the previous example ). If two streams are registered at the same time, check the specific type of returned values, use the Java instanceof operator, or use the dedicated issummaryinformation () and isdocumentsummaryinformation () methods. Remember, the CREATE () method always returns a propertyset object, so you can always call the issummaryinformation () and isdocumentsummaryinformation () methods for the CREATE () returned object, the propertyset class provides these two methods because the property set may be custom.

If you want to process custom property sets or read user-defined properties from standard Property sets, you must use a more general API, as mentioned earlier, this API is much more complex and will not be discussed in this article. Please refer to the hpsf's how-to documentation and poi documentation.

Conclusion: This article discusses the application of hssf and how to output it to an Excel file. It also covers hpsf and how to read the abstract information of the attribute set document. Poi is a very powerful project, and many topics have not been covered in this article. For example, how to use hssf serializer to convert XML documents into EXCEL formats remains to be studied by yourself.

Refer:

Jakarta poi project Homepage

Source code of Jakarta poi

IX. Appendix

Instance: Creates and returns a workbook using Servlet.

Package org. Apache. Poi. hssf. usermodel. examples;

Import java. Io .*;
Import java.net .*;
Import javax. servlet .*;
Import javax. servlet. http .*;
Import org. Apache. Poi. hssf. usermodel .*;

Public class hssfcreate extends httpservlet {
Public void Init (servletconfig config)
Throws servletexception {
Super. INIT (config );
}

Public void destroy (){
}

/** Handle http get and post requests
* @ Param request: Request
* @ Param response: Response
*/
Protected void processrequest (httpservletrequest request,
Httpservletresponse response)
Throws servletexception, ioexception {

Response. setcontenttype ("application/vnd. MS-excel ");
Hssfworkbook WB = new hssfworkbook ();
Hssfsheet sheet = WB. createsheet ("new sheet ");

// Create a new row and add several cells.
// The row number starts from 0.
Hssfrow ROW = sheet. createrow (short) 0 );
// Create a cell and set the cell value
Hssfcell cell = row. createcell (short) 0 );
Cell. setcellvalue (1 );

Row. createcell (short) 1). setcellvalue (1.2 );
Row. createcell (short) 2). setcellvalue ("A string value ");
Row. createcell (short) 3). setcellvalue (true );
// Write the output result
Outputstream out = response. getoutputstream ();
WB. Write (out );
Out. Close ();
}

/** Handle http get requests
* @ Param request: Request
* @ Param response: Response
*/
Protected void doget (httpservletrequest request,
Httpservletresponse response)
Throws servletexception, ioexception {
Processrequest (request, response );
}

/** Handle http post requests
* @ Param request: Request
* @ Param response: Response
*/
Protected void dopost (httpservletrequest request,
Httpservletresponse response)
Throws servletexception, ioexception {
Processrequest (request, response );
}

/** Return a brief description of Servlet
*/
Public String getservletinfo (){
Return "Example: Using hssf to create an Excel Workbook in servlet ";
}
}

 

 

 

 

 

Poi hssf operation MS Excel brief (how to find Excel Import and Export implementation in the design phase of omis Phase II)
Poi hssf MS Excel

Poi hssf is a Java package dedicated to Excel operations. You can use pure Java to operate XLS files.
The poi hssf class files are all stored in the org. Apache. Poi. hssf package. The classes in this package can be used to operate Excel files in Java.
The following describes how to use poi hssf to operate an Excel file:

1. Create an Excel worksheet
Hssfworkbook WB = new hssfworkbook ();

2. Create an Excel worksheet. Each worksheet corresponds to a sheet 1, sheet 2...
Hssfsheet sheet1 = WB. createsheet ("new sheet ");

3. Create a cell in the worksheet

// First, create a row object, and pass the row number as the parameter to the createrow method. The first row starts from 0.
Hssfrow ROW = sheet. createrow (short) 0 );

// Create a cell
Hssfcell cell = row. createcell (short) 0 );

// Assign values to cells
Cell. setcellvalue (1 );

// You can also create cells and assign values in the same row.
Row. createcell (short) 1). setcellvalue (1.2 );
Row. createcell (short) 2). setcellvalue ("this is a string ");
Row. createcell (short) 3). setcellvalue (true );

// The data format can be set by default when a cell value is created, as shown in the preceding figure.
// You can also create a cell and call setcelltype to specify
Cell. setcelltype (cell_type_numeric );

4. Insert a date value to a cell
Hssfworkbook WB = new hssfworkbook ();
Hssfsheet sheet = WB. createsheet ("new sheet ");

// You can use sheet. setsheetname (sheetindex, "sheetname", encoding) to set the worksheet name.

// Create a new row and add cells to it. The row number starts from 0.
Hssfrow ROW = sheet. createrow (short) 0 );

// Create a cell and enter a date value to it, but the first cell is not in the date format.
Hssfcell cell = row. createcell (short) 0 );
Cell. setcellvalue (new date ());

// Change the second cell to the date format. You need to create a new cell format from the workbook, which may affect only the currently created cell.
Hssfcellstyle cellstyle = WB. createcellstyle ();
Cellstyle. setdataformat (hssfdataformat. getbuiltinformat ("m/D/yy H: mm "));
Cell = row. createcell (short) 1 );
Cell. setcellvalue (new date ());
Cell. setcellstyle (cellstyle );

5. Various cell styles
Hssfcellstyle cellstyle = WB. createcellstyle ();
// Align
Cellstyle. setalignment (hssfcellstyle. align_center );

// With border
Cellstyle. setborderbottom (hssfcellstyle. border_thin );

// Color and fill Style
Cellstyle. setfillbackgroundcolor (hssfcolor. Aqua. Index );
Cellstyle. setfillpattern (hssfcellstyle. big_spots );
Cellstyle. setfillforegroundcolor (hssfcolor. Orange. Index );
Cellstyle. setfillpattern (hssfcellstyle. solid_foreground );

6. Row Height and column width.
Hssfworkbook WB = new hssfworkbook ();
Hssfsheet sheet = WB. createsheet ("new sheet ");
Hssfrow ROW = sheet. createrow (short) 0 );

// 2 is the row high value
Row. setrowheight (2 );

// 3 is the column number and 4 is the column width value
Sheet. setcolumnwidth (3, 4 );

7. Routine

First, call a method to retrieve data from the Oracle database and put it in the list instance. Here I call listproductquery () of the productdata class in the srrd project to obtain a list instance. Objects in the list are a series of instances named productquery object classes. Read the list and put the data in the productquery instance into the hssfcell cell. Finally, output the data in hssfworkbook to the output stream to complete data export.

// Create a workbook
Hssfworkbook WB = new hssfworkbook ();
// Create a worksheet named example
Hssfsheet sheet = WB. createsheet ("example ");
// Define the column width for the first eight columns in the worksheet
Sheet. setcolumnwidth (short) 0, (short) 2500 );
Sheet. setcolumnwidth (short) 1, (short) 6000 );
Sheet. setcolumnwidth (short) 2, (short) 3500 );
Sheet. setcolumnwidth (short) 3, (short) 9000 );
Sheet. setcolumnwidth (short) 4, (short) 8000 );
Sheet. setcolumnwidth (short) 5, (short) 8000 );
Sheet. setcolumnwidth (short) 6, (short) 20000 );
Sheet. setcolumnwidth (short) 7, (short) 8000 );
// In the table
Hssfrow ROW = sheet. createrow (0 );
// Create a cell
Hssfcell cell [] = new hssfcell [8];
For (short I = 0; I <8; I ++ ){
Cell= Row. createcell (I );
// Define the cell as utf_16 encoding so that the output data is not garbled
Cell. setencoding (hssfcell. encoding_utf_16 );
}
// Write the cell title
Cell [0]. setcellvalue ("registration ID ");
Cell [1]. setcellvalue ("Register number ");
Cell [2]. setcellvalue ("city id ");
Cell [3]. setcellvalue ("Chinese product name ");
Cell [4]. setcellvalue ("English name of the product ");
Cell [5]. setcellvalue ("product service object ");
Cell [6]. setcellvalue ("product function description ");
Cell [7]. setcellvalue ("product category ");
// Query the database and obtain the list instance of the Data list
List list = new arraylist ();
Productdatamanager Mgr = new productdatamanager ();
Try {
List = Mgr. listproductquery ("", ","-2-1 ","-2-1 ");
} Catch (srrdexception e ){
E. printstacktrace ();
}
// Extract data from the list and put it into the worksheet
If (list! = NULL & list. Size ()> 0 ){
For (INT I = 0; I <list. Size ()-1; I ++ ){
Productquery query = (productquery) list. Get (I );
Hssfrow datarow = sheet. createrow (I + 1 );
Hssfcell data [] = new hssfcell [8];
For (short J = 0; j <8; j ++ ){
Data [J] = datarow. createcell (j );
// Define the cell as utf_16 encoding so that the output data is not garbled
Data [J]. setencoding (hssfcell. encoding_utf_16 );
}
Data [0]. setcellvalue (query. getcertid ());
Data [1]. setcellvalue (query. getcertnum ());
Data [2]. setcellvalue (query. getcitycode ());
Data [3]. setcellvalue (query. getsoftwarecname ());
Data [4]. setcellvalue (query. getsoftwareename ());
Data [5]. setcellvalue (query. getsoftwarefor ());
Data [6]. setcellvalue (query. getsoftwarefuncdesc ());
Data [7]. setcellvalue (query. getsoftwaretype ());
}
}
// Output the workbook to the output stream
Servletoutputstream SOS = response. getoutputstream ();
WB. Write (SOS );
SOS. Close ();

// It can also be output as an xls file
File file = new file ("workbook.xls ");
Try {
Fileoutputstream fileout = new fileoutputstream (File );
WB. Write (fileout );
Fileout. Close ();
} Catch (ioexception e ){
E. printstacktrace ();
}

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.