Use (POI) Sax to work with Excel files to prevent memory overflow

Source: Internet
Author: User
Tags builtin

Poisaxreader

Fix POI read xlsx file memory consumption too much

POI handles Excel to provide a more user-friendly mode and compare the underlying event pattern. Among them, the user mode provides a good package, while compatible with 2003 and more than the format, the use is quite convenient. However, the cost is to spend huge amount of memory. As long as more than 6w, basically is the memory overflow.
Fortunately, the POI team also provides a lower level of flow processing mode EventMode, for large data of the xlsx file write, POI 3.8 provides SXSSF, in the cache way to write as a file. For file reads, the XML file corresponding to each sheet is read directly by Sax.

POI Sheetcontentshandler Interface

In the POI has already done the basic encapsulation to sax at that time reads the corresponding sheet XML file, so we just need to implement the interface Sheetcontentshandler, can complete the way sax reads. This interface needs to be implemented in three ways

    • public void StartRow (int rowNum) reads a line start
    • public void Endrow (int rowNum) reads the end of a line
    • public void cell (string cellreference, string formattedvalue,xssfcomment comment) reads a cell in a row
    • public void HeaderFooter (string text, Boolean IsHeader, String tagName) is temporarily unclear
POI Sheetcontentshandler Implementation

Here I mainly refer to poi Xlsx2csv.java Implementation method, you need to provide the corresponding xlsx file maximum number of columns. Secondly, I have expanded on this basis, and in Endrow provides an event, the currently processed row data, to make this parsing function more independent.
Implement the idea, construct a list object in the StartRow method, add each cell content in the cell function, and at Endrow function to determine whether the current column equals the maximum number of columns, if the loop is not equal, and the row event is added
sheetsaxhandler verbose code

Protected class Sheetsaxhandler implements Sheetcontentshandler {private int currentrow =-1;        private int currentcol =-1;        private int mincolumns;        public void setmincolumns (int mincolumns) {this.mincolumns = Mincolumns;            } public Sheetsaxhandler (int mincolumns) {super ();        This.mincolumns = Mincolumns; } public Sheetsaxhandler () {} private list<sheetrowlistener> listeners = new Arraylist<shee        Trowlistener> (); Private list<string> lrows = new arraylist<string> ();        Processing a line of information public void rowadded (Sheetrowlistener add) {listeners.add (add);  } private void postrowadded (list<string> row, int rowNum) throws SQLException {for        (Sheetrowlistener hl:listeners) hl.addrow (row, rowNum);    } @Override public void StartRow (int rowNum) {currentrow = RowNum;        Currentcol =-1;        Lrows.clear (); } @Override public void Endrow (int rowNum) {//Add data for (int i = Currentcol; i < m Incolumns;            i++) {Lrows.add ("");            } try {postrowadded (lrows, RowNum);            } catch (SQLException e) {e.printstacktrace (); }} @Override public void cell (string cellreference, String formattedvalue, Xssfcommen T comment) {if (cellreference = = null) {cellreference = new celladdress (CurrentRow, Currentcol            ). Formatasstring ();            } int thiscol = (new Cellreference (cellreference)). Getcol ();                int missedcols = thiscol-currentcol-1;//processing data exists in the middle of a blank for (int i = 0; i < Missedcols; i++) {            This.lRows.add ("");            } Currentcol = Thiscol; TODO data type processing try {                Double.parsedouble (FormattedValue);            This.lRows.add (FormattedValue);            } catch (NumberFormatException e) {this.lRows.add (formattedvalue); }} @Override public void HeaderFooter (string text, Boolean IsHeader, String tagName) {Sy        Stem.out.println (text + "= =" + IsHeader + "= =" + TagName); }    }

Event interface

interface SheetRowListener {        void addRow(List<String> row, int rowNum);    }
Invocation mode
    1. Open File
    2. Find the corresponding sheet XML file
    3. Use the above method to process each of the sheet in turn
Working with files
             @Override public int Savetooracle (string filePath, String pcId) throws FileNotFoundException, E Ncrypteddocumentexception, Invalidformatexception, IOException, ClassNotFoundException, SQLException        , Openxml4jexception, saxexception, parserconfigurationexception {file F = new file (FilePath);        Opcpackage p = null;        int num = 0;        Connection conn = null;                if (f.exists ()) {try {Jsonarray Sheetcfgs = This.cfgJson.getJSONArray ("sheets");                Databuferrows = This.cfgJson.getInteger ("Databuferrows"); Databuferrows = Databuferrows = = null?                1000:databuferrows;                conn = Ca.getconnection (Ca.getsqlcfg (Serverpath)); String importtime = new SimpleDateFormat ("Yyyy-mm-dd HH:mm:ss.                SSS "). Format (new Date ());                p = Opcpackage.open (f, packageaccess.read); Readonlysharedstringstable Strings = new readonlysharedstringstable (p);                Xssfreader Xssfreader = new Xssfreader (p);                Stylestable styles = xssfreader.getstylestable ();                Xssfreader.sheetiterator iter = (xssfreader.sheetiterator) xssfreader. Getsheetsdata ();                Hashmap<integer, jsonobject> hsheetcfg = new Hashmap<integer, jsonobject> ();                    for (int i = 0; i < sheetcfgs.size (); i++) {Jsonobject sheetcfg = Sheetcfgs.getjsonobject (i);                Hsheetcfg.put (Sheetcfg.getinteger ("Sheetindex"), sheetcfg);                } int index = 1;                    while (Iter.hasnext ()) {InputStream Sheetstream = Iter.next ();                                if (Hsheetcfg.containskey (index)) {Processsheet (Styles, strings, new Sheetsaxhandler (),                 Sheetstream, Hsheetcfg.get (index), Conn, PcId,               This.filename, Importtime);                } index++;                } p.close ();                f = null;            Conn.close ();                } catch (SQLException e) {conn.close ();                conn = null;            Throw e;    }} return num; }
Handling sheet
public void Processsheet (stylestable styles, readonlysharedstringstable strings, Sheetsaxhandler Sheethandler, InputStream Sheetinputstream, Final Jsonobject sheetcfg, final Connection conn, string PcID, String FileName, String importtime) throws IOException, Parserconfigurationexception, Saxexception, Sqlexce ption {Final PreparedStatement PS = conn.preparestatement (Ca.buildinsertsql (sheetcfg, PcID, Filena        Me, importtime));        Final int datastartnum = Sheetcfg.getintvalue ("Datastartnum");        Sheethandler.setmincolumns (Sheetcfg.getjsonarray ("FieldReference"). Size ()); sheethandler.rowadded (New Sheetrowlistener () {@Override public void AddRow (list<string> row,                int rowNum) {if (RowNum < dataStartNum-1) return;         try {ca.setparamter (PS, sheetcfg, Row, rownum-datastartnum);           if (rowNum% databuferrows = = 0) {ps.executebatch ();                    Ps.clearbatch ();                        }} catch (SQLException e) {try {ps.close ();                        Conn.close ();                    Throw e;                    } catch (SQLException E1) {e1.printstacktrace ();                } e.printstacktrace ();        }            }        });        XMLReader Sheetparser = Saxhelper.newxmlreader ();        Dataformatter formatter = new Dataformatter ();        InputSource Sheetsource = new InputSource (sheetinputstream);        ContentHandler handler = new Xssfsheetxmlhandler (styles, NULL, strings, Sheethandler, formatter, false);        Sheetparser.setcontenthandler (handler);        Sheetparser.parse (Sheetsource);        Processing the remaining data ps.executebatch ();        Ps.clearbatch (); Close the current PS PS. Close (); }`
Summarize

In the initial use of the POI user mode, quickly completed an Excel file parsing, very convenient. With the gradual deepening of the project, the processing of the Excel file is getting larger, user mode is not competent. Then began to look for information, on the official website to see the example of CSV.
The main function of this code is to import the data in the Excel file into the corresponding table of the Oracle database, and in the implementation of the function, I have mainly encountered the following problems

  1. Resolves an Excel file Resolution memory leak (2007 after the file is basically resolved with sax mode)
  2. Corresponding to a large number of data storage, the speed has been very slow, although I have adopted the method of batch submission (at present, I still do not find a good solution, if there are peers to see, but also hope that more advice )

Use (POI) Sax to work with Excel files to prevent memory overflow

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.