Poisaxreader
Fix POI read xlsx file memory consumption too much
POI handles Excel to provide a more user-friendly mode and compare the underlying event pattern. Among them, the user mode provides a good package, while compatible with 2003 and more than the format, the use is quite convenient. However, the cost is to spend huge amount of memory. As long as more than 6w, basically is the memory overflow.
Fortunately, the POI team also provides a lower level of flow processing mode EventMode, for large data of the xlsx file write, POI 3.8 provides SXSSF, in the cache way to write as a file. For file reads, the XML file corresponding to each sheet is read directly by Sax.
POI Sheetcontentshandler Interface
In the POI has already done the basic encapsulation to sax at that time reads the corresponding sheet XML file, so we just need to implement the interface Sheetcontentshandler, can complete the way sax reads. This interface needs to be implemented in three ways
- public void StartRow (int rowNum) reads a line start
- public void Endrow (int rowNum) reads the end of a line
- public void cell (string cellreference, string formattedvalue,xssfcomment comment) reads a cell in a row
- public void HeaderFooter (string text, Boolean IsHeader, String tagName) is temporarily unclear
POI Sheetcontentshandler Implementation
Here I mainly refer to poi Xlsx2csv.java
Implementation method, you need to provide the corresponding xlsx file maximum number of columns. Secondly, I have expanded on this basis, and in Endrow
provides an event, the currently processed row data, to make this parsing function more independent.
Implement the idea, construct a list object in the StartRow
method, add each cell content in the cell
function, and at Endrow
function to determine whether the current column equals the maximum number of columns, if the loop is not equal, and the row event is added
sheetsaxhandler verbose code
Protected class Sheetsaxhandler implements Sheetcontentshandler {private int currentrow =-1; private int currentcol =-1; private int mincolumns; public void setmincolumns (int mincolumns) {this.mincolumns = Mincolumns; } public Sheetsaxhandler (int mincolumns) {super (); This.mincolumns = Mincolumns; } public Sheetsaxhandler () {} private list<sheetrowlistener> listeners = new Arraylist<shee Trowlistener> (); Private list<string> lrows = new arraylist<string> (); Processing a line of information public void rowadded (Sheetrowlistener add) {listeners.add (add); } private void postrowadded (list<string> row, int rowNum) throws SQLException {for (Sheetrowlistener hl:listeners) hl.addrow (row, rowNum); } @Override public void StartRow (int rowNum) {currentrow = RowNum; Currentcol =-1; Lrows.clear (); } @Override public void Endrow (int rowNum) {//Add data for (int i = Currentcol; i < m Incolumns; i++) {Lrows.add (""); } try {postrowadded (lrows, RowNum); } catch (SQLException e) {e.printstacktrace (); }} @Override public void cell (string cellreference, String formattedvalue, Xssfcommen T comment) {if (cellreference = = null) {cellreference = new celladdress (CurrentRow, Currentcol ). Formatasstring (); } int thiscol = (new Cellreference (cellreference)). Getcol (); int missedcols = thiscol-currentcol-1;//processing data exists in the middle of a blank for (int i = 0; i < Missedcols; i++) { This.lRows.add (""); } Currentcol = Thiscol; TODO data type processing try { Double.parsedouble (FormattedValue); This.lRows.add (FormattedValue); } catch (NumberFormatException e) {this.lRows.add (formattedvalue); }} @Override public void HeaderFooter (string text, Boolean IsHeader, String tagName) {Sy Stem.out.println (text + "= =" + IsHeader + "= =" + TagName); } }
Event interface
interface SheetRowListener { void addRow(List<String> row, int rowNum); }
Invocation mode
- Open File
- Find the corresponding sheet XML file
- Use the above method to process each of the sheet in turn
Working with files
@Override public int Savetooracle (string filePath, String pcId) throws FileNotFoundException, E Ncrypteddocumentexception, Invalidformatexception, IOException, ClassNotFoundException, SQLException , Openxml4jexception, saxexception, parserconfigurationexception {file F = new file (FilePath); Opcpackage p = null; int num = 0; Connection conn = null; if (f.exists ()) {try {Jsonarray Sheetcfgs = This.cfgJson.getJSONArray ("sheets"); Databuferrows = This.cfgJson.getInteger ("Databuferrows"); Databuferrows = Databuferrows = = null? 1000:databuferrows; conn = Ca.getconnection (Ca.getsqlcfg (Serverpath)); String importtime = new SimpleDateFormat ("Yyyy-mm-dd HH:mm:ss. SSS "). Format (new Date ()); p = Opcpackage.open (f, packageaccess.read); Readonlysharedstringstable Strings = new readonlysharedstringstable (p); Xssfreader Xssfreader = new Xssfreader (p); Stylestable styles = xssfreader.getstylestable (); Xssfreader.sheetiterator iter = (xssfreader.sheetiterator) xssfreader. Getsheetsdata (); Hashmap<integer, jsonobject> hsheetcfg = new Hashmap<integer, jsonobject> (); for (int i = 0; i < sheetcfgs.size (); i++) {Jsonobject sheetcfg = Sheetcfgs.getjsonobject (i); Hsheetcfg.put (Sheetcfg.getinteger ("Sheetindex"), sheetcfg); } int index = 1; while (Iter.hasnext ()) {InputStream Sheetstream = Iter.next (); if (Hsheetcfg.containskey (index)) {Processsheet (Styles, strings, new Sheetsaxhandler (), Sheetstream, Hsheetcfg.get (index), Conn, PcId, This.filename, Importtime); } index++; } p.close (); f = null; Conn.close (); } catch (SQLException e) {conn.close (); conn = null; Throw e; }} return num; }
Handling sheet
public void Processsheet (stylestable styles, readonlysharedstringstable strings, Sheetsaxhandler Sheethandler, InputStream Sheetinputstream, Final Jsonobject sheetcfg, final Connection conn, string PcID, String FileName, String importtime) throws IOException, Parserconfigurationexception, Saxexception, Sqlexce ption {Final PreparedStatement PS = conn.preparestatement (Ca.buildinsertsql (sheetcfg, PcID, Filena Me, importtime)); Final int datastartnum = Sheetcfg.getintvalue ("Datastartnum"); Sheethandler.setmincolumns (Sheetcfg.getjsonarray ("FieldReference"). Size ()); sheethandler.rowadded (New Sheetrowlistener () {@Override public void AddRow (list<string> row, int rowNum) {if (RowNum < dataStartNum-1) return; try {ca.setparamter (PS, sheetcfg, Row, rownum-datastartnum); if (rowNum% databuferrows = = 0) {ps.executebatch (); Ps.clearbatch (); }} catch (SQLException e) {try {ps.close (); Conn.close (); Throw e; } catch (SQLException E1) {e1.printstacktrace (); } e.printstacktrace (); } } }); XMLReader Sheetparser = Saxhelper.newxmlreader (); Dataformatter formatter = new Dataformatter (); InputSource Sheetsource = new InputSource (sheetinputstream); ContentHandler handler = new Xssfsheetxmlhandler (styles, NULL, strings, Sheethandler, formatter, false); Sheetparser.setcontenthandler (handler); Sheetparser.parse (Sheetsource); Processing the remaining data ps.executebatch (); Ps.clearbatch (); Close the current PS PS. Close (); }`
Summarize
In the initial use of the POI user mode, quickly completed an Excel file parsing, very convenient. With the gradual deepening of the project, the processing of the Excel file is getting larger, user mode is not competent. Then began to look for information, on the official website to see the example of CSV.
The main function of this code is to import the data in the Excel file into the corresponding table of the Oracle database, and in the implementation of the function, I have mainly encountered the following problems
- Resolves an Excel file Resolution memory leak (2007 after the file is basically resolved with sax mode)
- Corresponding to a large number of data storage, the speed has been very slow, although I have adopted the method of batch submission (at present, I still do not find a good solution, if there are peers to see, but also hope that more advice )
Use (POI) Sax to work with Excel files to prevent memory overflow