Document Conversion Performance Test
Two PDF conversion components are used in the financial system
One is com.artofsolving, which is the first time the system references a component:
<!--https://mvnrepository.com/artifact/com.artofsolving/jodconverter-->
<dependency>
< groupid>com.artofsolving</groupid>
<artifactId>jodconverter</artifactId>
< Version>2.2.1</version>
</dependency>
Another is the org.artofsolving, the second time the system refers to the upload component:
<!--Https://mvnrepository.com/artifact/org.artofsolving.jodconverter/jodconverter-core-->
< dependency>
<groupId>org.artofsolving.jodconverter</groupId>
<artifactId> jodconverter-core</artifactid>
<version>3.0-beta-4</version>
</dependency>
These two kinds of performance are different in the process of project development testing,
First, OpenOffice is 4.1.2.
Support recommendations:
* Microsoft Windows XP, Vista, Windows 7, or Windows 8
* Pentium III or higher series processors
* 256 MB RAM (recommended to use RAM MB)
* Up to 1.5 GB hard disk free space
* 1024x768 resolution (recommended to use higher resolution), at least 256 colors
Dropzone: Supported Configurations
Single file maximum support 100M upload
No restrictions on the number of uploaded files
The number of files uploaded at the same time is 3
The following mainly look at the upload of PDF conversion efficiency
Test file: Test.doc,test.ppt,test.xls
By contrast, there is no speed advantage in addition to org supporting more formats, and the text clarity that is converted is a little bit lower than com.
The Linux upload is not read in particular when the filename has a () symbol.
Why support the file type is different, because com.artofsolving source code in Documentformatregistry have a variety of implementations, this is an interface, the default document format registered object Documentformats list, there is no MS 2007 of:
public class Defaultdocumentformatregistry extends Basicdocumentformatregistry {public defaultdocumentformatregistry
() {Final Documentformat PDF = new Documentformat ("Portable Document Format", "Application/pdf", "pdf");
Pdf.setexportfilter (documentfamily.drawing, "Draw_pdf_export");
Pdf.setexportfilter (documentfamily.presentation, "Impress_pdf_export");
Pdf.setexportfilter (Documentfamily.spreadsheet, "Calc_pdf_export");
Pdf.setexportfilter (Documentfamily.text, "Writer_pdf_export");
Adddocumentformat (PDF);
Final Documentformat swf = new Documentformat ("Macromedia Flash", "Application/x-shockwave-flash", "SwF");
Swf.setexportfilter (documentfamily.drawing, "Draw_flash_export");
Swf.setexportfilter (documentfamily.presentation, "Impress_flash_export");
Adddocumentformat (SWF);
Final Documentformat xhtml = new Documentformat ("XHTML", "Application/xhtml+xml", "XHTML"); Xhtml.setexportfilter (documentfamily.presEntation, "XHTML Impress File");
Xhtml.setexportfilter (Documentfamily.spreadsheet, "XHTML Calc File");
Xhtml.setexportfilter (Documentfamily.text, "XHTML Writer File");
Adddocumentformat (XHTML); HTML is treated as Text when supplied as input, but as a output it is also//available for exporting Spreadshee T and presentation formats final Documentformat HTML = new Documentformat ("HTML", Documentfamily.text, "text/html",
"HTML");
Html.setexportfilter (documentfamily.presentation, "Impress_html_export");
Html.setexportfilter (Documentfamily.spreadsheet, "HTML (Starcalc)");
Html.setexportfilter (Documentfamily.text, "HTML (StarWriter)");
Adddocumentformat (HTML); Final Documentformat odt = new Documentformat ("OpenDocument Text", Documentfamily.text, "application/
Vnd.oasis.opendocument.text "," odt ");
Odt.setexportfilter (Documentfamily.text, "Writer8");
Adddocumentformat (ODT); Final Documentformat Sxw = new Documentformat ("OpenOffice.org 1.0 Text Document", Documentfamily.text, "Application/vnd.sun.xml.writer", "Sxw");
Sxw.setexportfilter (Documentfamily.text, "StarOffice XML (Writer)");
Adddocumentformat (SXW);
Final Documentformat doc = new Documentformat ("Microsoft Word", Documentfamily.text, "Application/msword", "Doc");
Doc.setexportfilter (Documentfamily.text, "MS Word 97");
Adddocumentformat (DOC);
Final Documentformat rtf = new Documentformat ("Rich Text Format", Documentfamily.text, "Text/rtf", "RTF");
Rtf.setexportfilter (Documentfamily.text, "Rich TEXT Format");
Adddocumentformat (RTF);
Final Documentformat wpd = new Documentformat ("WordPerfect", Documentfamily.text, "Application/wordperfect", "WPD");
Adddocumentformat (WPD);
Final Documentformat txt = new Documentformat ("Plain Text", Documentfamily.text, "Text/plain", "txt"); Set FilterName to ' Text ' to prevent OOo ' tryign to display the ' ASCII filteR Options "dialog//alternatively filtername could is" Text (encoded) "and filteroptions used to set encoding if
Needed txt.setimportoption ("FilterName", "Text");
Txt.setexportfilter (Documentfamily.text, "TEXT");
Adddocumentformat (TXT);
Final Documentformat wikitext = new Documentformat ("MediaWiki wikitext", "Text/x-wiki", "wiki");
Wikitext.setexportfilter (Documentfamily.text, "MediaWiki");
Adddocumentformat (wikitext); Final Documentformat ODS = new Documentformat ("OpenDocument spreadsheet", Documentfamily.spreadsheet, "application/
Vnd.oasis.opendocument.spreadsheet "," ODS ");
Ods.setexportfilter (Documentfamily.spreadsheet, "Calc8");
Adddocumentformat (ODS); Final Documentformat sxc = new Documentformat ("OpenOffice.org 1.0 spreadsheet", Documentfamily.spreadsheet, "
Application/vnd.sun.xml.calc "," SXC ");
Sxc.setexportfilter (Documentfamily.spreadsheet, "StarOffice XML (Calc)");
Adddocumentformat (SXC); Final Documentformat xls = new Documentformat ("Microsoft Excel", Documentfamily.spreadsheet, "Application/vnd.ms-exc
El "," xls ");
Xls.setexportfilter (Documentfamily.spreadsheet, "MS Excel 97");
Adddocumentformat (XLS);
Final Documentformat csv = new Documentformat ("CSV", Documentfamily.spreadsheet, "Text/csv", "CSV");
Csv.setimportoption ("FilterName", "Text-txt-csv (Starcalc)"); Csv.setimportoption ("Filteroptions", "44,34,0"); Field Separator: ', ';
Text Delimiter: ' Csv.setexportfilter (Documentfamily.spreadsheet, Text-txt-csv (Starcalc));
Csv.setexportoption (Documentfamily.spreadsheet, "filteroptions", "44,34,0");
Adddocumentformat (CSV); Final Documentformat TSV = new Documentformat ("tab-separated Values", Documentfamily.spreadsheet, "text/
Tab-separated-values "," TSV ");
Tsv.setimportoption ("FilterName", "Text-txt-csv (Starcalc)"); Tsv.setimportoption ("Filteroptions", "9,34,0")); Field Separator: ' t ';
Text Delimiter: ' Tsv.setexportfilter (Documentfamily.spreadsheet, Text-txt-csv (Starcalc));
Tsv.setexportoption (Documentfamily.spreadsheet, "filteroptions", "9,34,0");
Adddocumentformat (TSV); Final Documentformat ODP = new Documentformat ("OpenDocument presentation", Documentfamily.presentation, "application/
Vnd.oasis.opendocument.presentation "," ODP ");
Odp.setexportfilter (documentfamily.presentation, "Impress8");
Adddocumentformat (ODP); Final Documentformat SXi = new Documentformat ("OpenOffice.org 1.0 presentation", Documentfamily.presentation, "
Application/vnd.sun.xml.impress "," SXi ");
Sxi.setexportfilter (Documentfamily.presentation, "StarOffice XML (Impress)");
Adddocumentformat (SXI); Final Documentformat ppt = new Documentformat ("Microsoft PowerPoint", Documentfamily.presentation, "application/
Vnd.ms-powerpoint "," ppt "); Ppt.setexportfilter (documentfamily.presentation, "MS PowerPoint 97 ");
Adddocumentformat (PPT); Final Documentformat ODG = new Documentformat ("OpenDocument Drawing", documentfamily.drawing, "application/
Vnd.oasis.opendocument.graphics "," ODG ");
Odg.setexportfilter (documentfamily.drawing, "draw8");
Adddocumentformat (ODG);
Final Documentformat svg = new Documentformat ("Scalable Vector Graphics", "Image/svg+xml", "svg");
Svg.setexportfilter (documentfamily.drawing, "Draw_svg_export");
Adddocumentformat (SVG); and Org has the following source code: public class Defaultdocumentformatregistry extends Simpledocumentformatregistry {public DEFAULTDOCU Mentformatregistry () {Documentformat PDF = new Documentformat ("Portable Document Format", "PDF", "Application/pdf
");
Pdf.setstoreproperties (Documentfamily.text, Collections.singletonmap ("FilterName", "Writer_pdf_Export"));
Pdf.setstoreproperties (Documentfamily.spreadsheet, Collections.singletonmap ("FilterName", "Calc_pdf_Export")); Pdf.setstoreproperties (Documentfamily.presentation, Collections.singletonmap ("FilterName", "Impress_pdf_Export")
);
Pdf.setstoreproperties (documentfamily.drawing, Collections.singletonmap ("FilterName", "Draw_pdf_Export"));
This.addformat (PDF);
Documentformat swf = new Documentformat ("Macromedia Flash", "SwF", "Application/x-shockwave-flash"); Swf.setstoreproperties (Documentfamily.presentation, Collections.singletonmap ("FilterName", "Impress_flash_Export
"));
Swf.setstoreproperties (documentfamily.drawing, Collections.singletonmap ("FilterName", "Draw_flash_Export"));
This.addformat (SWF);
Documentformat html = new Documentformat ("HTML", "HTML", "text/html");
Html.setinputfamily (Documentfamily.text);
Html.setstoreproperties (Documentfamily.text, Collections.singletonmap ("FilterName", "HTML (StarWriter)")); Html.setstoreproperties (Documentfamily.spreadsheet, Collections.singletonmap ("FilterName", "HTML" (StarCaLC)); Html.setstoreproperties (Documentfamily.presentation, Collections.singletonmap ("FilterName", "Impress_html_Export
"));
This.addformat (HTML);
Documentformat odt = new Documentformat ("OpenDocument Text", "Odt", "Application/vnd.oasis.opendocument.text");
Odt.setinputfamily (Documentfamily.text);
Odt.setstoreproperties (Documentfamily.text, Collections.singletonmap ("FilterName", "Writer8"));
This.addformat (ODT);
Documentformat sxw = new Documentformat ("OpenOffice.org 1.0 Text Document", "Sxw", "Application/vnd.sun.xml.writer");
Sxw.setinputfamily (Documentfamily.text);
Sxw.setstoreproperties (Documentfamily.text, Collections.singletonmap ("FilterName", "StarOffice XML (Writer)"));
This.addformat (SXW);
Documentformat doc = new Documentformat ("Microsoft Word", "Doc", "Application/msword");
Doc.setinputfamily (Documentfamily.text); Doc.setstoreproperties (Documentfamily.text, collections.siNgletonmap ("FilterName", "MS Word 97"));
This.addformat (DOC); Documentformat docx = new Documentformat ("Microsoft Word 2007 XML", "docx", "application/
Vnd.openxmlformats-officedocument.wordprocessingml.document ");
Docx.setinputfamily (Documentfamily.text);
This.addformat (docx);
Documentformat rtf = new Documentformat ("Rich Text Format", "RTF", "text/rtf");
Rtf.setinputfamily (Documentfamily.text);
Rtf.setstoreproperties (Documentfamily.text, Collections.singletonmap ("FilterName", "Rich TEXT Format"));
This.addformat (RTF);
Documentformat WPD = new Documentformat ("WordPerfect", "WPD", "Application/wordperfect");
Wpd.setinputfamily (Documentfamily.text);
This.addformat (WPD);
Documentformat txt = new Documentformat ("Plain Text", "txt", "text/plain");
Txt.setinputfamily (Documentfamily.text);
Linkedhashmap txtloadandstoreproperties = new Linkedhashmap (); TxtloadandstorepropertiEs.put ("FilterName", "Text (encoded)");
Txtloadandstoreproperties.put ("Filteroptions", "UTF8");
Txt.setloadproperties (txtloadandstoreproperties);
Txt.setstoreproperties (Documentfamily.text, txtloadandstoreproperties);
This.addformat (TXT);
Documentformat wikitext = new Documentformat ("MediaWiki wikitext", "wiki", "Text/x-wiki");
Wikitext.setstoreproperties (Documentfamily.text, Collections.singletonmap ("FilterName", "MediaWiki")); Documentformat ODS = new Documentformat ("OpenDocument spreadsheet", "ODS", "application/
Vnd.oasis.opendocument.spreadsheet ");
Ods.setinputfamily (Documentfamily.spreadsheet);
Ods.setstoreproperties (Documentfamily.spreadsheet, Collections.singletonmap ("FilterName", "Calc8"));
This.addformat (ODS);
Documentformat SXC = new Documentformat ("OpenOffice.org 1.0 spreadsheet", "SxC", "Application/vnd.sun.xml.calc");
Sxc.setinputfamily (Documentfamily.spreadsheet); Sxc.sEtstoreproperties (Documentfamily.spreadsheet, Collections.singletonmap ("FilterName", "StarOffice XML (Calc)"));
This.addformat (SXC);
Documentformat xls = new Documentformat ("Microsoft Excel", "xls", "application/vnd.ms-excel");
Xls.setinputfamily (Documentfamily.spreadsheet);
Xls.setstoreproperties (Documentfamily.spreadsheet, Collections.singletonmap ("FilterName", "MS Excel 97"));
This.addformat (XLS); Documentformat xlsx = new Documentformat ("Microsoft Excel 2007 XML", "xlsx", "application/
Vnd.openxmlformats-officedocument.spreadsheetml.sheet ");
Xlsx.setinputfamily (Documentfamily.spreadsheet);
This.addformat (xlsx);
Documentformat csv = new Documentformat ("Comma separated Values", "CSV", "text/csv");
Csv.setinputfamily (Documentfamily.spreadsheet);
Linkedhashmap csvloadandstoreproperties = new Linkedhashmap ();
Csvloadandstoreproperties.put ("FilterName", "Text-txt-csv (Starcalc)"); CSvloadandstoreproperties.put ("Filteroptions", "44,34,0");
Csv.setloadproperties (csvloadandstoreproperties);
Csv.setstoreproperties (Documentfamily.spreadsheet, csvloadandstoreproperties);
This.addformat (CSV);
Documentformat TSV = new Documentformat ("tab separated Values", "TSV", "text/tab-separated-values");
Tsv.setinputfamily (Documentfamily.spreadsheet);
Linkedhashmap tsvloadandstoreproperties = new Linkedhashmap ();
Tsvloadandstoreproperties.put ("FilterName", "Text-txt-csv (Starcalc)");
Tsvloadandstoreproperties.put ("Filteroptions", "9,34,0");
Tsv.setloadproperties (tsvloadandstoreproperties);
Tsv.setstoreproperties (Documentfamily.spreadsheet, tsvloadandstoreproperties);
This.addformat (TSV); Documentformat ODP = new Documentformat ("OpenDocument presentation", "ODP", "application/
Vnd.oasis.opendocument.presentation ");
Odp.setinputfamily (documentfamily.presentation); Odp.setStoreproperties (Documentfamily.presentation, Collections.singletonmap ("FilterName", "Impress8"));
This.addformat (ODP);
Documentformat SXi = new Documentformat ("OpenOffice.org 1.0 presentation", "SXi", "application/vnd.sun.xml.impress");
Sxi.setinputfamily (documentfamily.presentation); Sxi.setstoreproperties (Documentfamily.presentation, Collections.singletonmap ("FilterName", "StarOffice XML" (
Impress));
This.addformat (SXI);
Documentformat ppt = new Documentformat ("Microsoft PowerPoint", "ppt", "Application/vnd.ms-powerpoint");
Ppt.setinputfamily (documentfamily.presentation);
Ppt.setstoreproperties (Documentfamily.presentation, Collections.singletonmap ("FilterName", "MS PowerPoint 97"));
This.addformat (PPT); Documentformat pptx = new Documentformat ("Microsoft PowerPoint 2007 XML", "pptx", "application/
Vnd.openxmlformats-officedocument.presentationml.presentation "); Pptx.setinputfamily (DOCUMENTFAMILY.PResentation);
This.addformat (PPTX); Documentformat ODG = new Documentformat ("OpenDocument Drawing", "ODG", "Application/vnd.oasis.opendocument.graphics")
;
Odg.setinputfamily (documentfamily.drawing);
Odg.setstoreproperties (documentfamily.drawing, Collections.singletonmap ("FilterName", "Draw8"));
This.addformat (ODG);
Documentformat svg = new Documentformat ("Scalable Vector Graphics", "svg", "image/svg+xml");
Svg.setstoreproperties (documentfamily.drawing, Collections.singletonmap ("FilterName", "Draw_svg_Export"));
This.addformat (SVG);
}
}
The principle implementation is basically similar, possibly through the customization to realize the COM various file way support.
The time spent on testing the number and size of files is also different, the number of files, the size of the medium file in a serial format for PDF conversion Time is certainly relatively long, here can be changed to parallel the way to speed up processing.
In particular, Org has two ways to create conversions, one that supports Ms 2007, and the other does not support:
Ms 2007 is not supported:
Documentconverter converter = new Streamopenofficedocumentconverter (connection);
But the net says can solve:
com.artofsolving.jodconverter.openoffice.connection.OpenOfficeException:conversion failed:could The exception of the not load input document, which is the problem with the path resolution of the file name in the Linux system.
Support:
Officemanager Officemanager = Getofficemanager ();
Connection OpenOffice
Officedocumentconverter converter = new Officedocumentconverter (Officemanager);
How COM is created to convert objects:
connection.connect ();
Documentconverter converter = new Openofficedocumentconverter (connection);
COM also has the way to create transformation objects through Streamopenofficedocumentconverter, which is not used in this system.
In summary, if the average upload file is not more than 5M, and no more than 5 files, the system can be completed in 10 seconds.
In later tests, if the file is larger than 10M, the conversion frequency is high, which consumes system resources and cannot complete the conversion, and subsequent conversion tasks are not accepted in the component's task queue. There is a performance problem, large file conversion (20M or so) sometimes a timeout, and the source code set in the single PDF conversion task execution Time is 120s, timeout is the error, and reconnect, processing the next task.
There are a lot of holes in the development of attachment uploads: Unable to read the input file-port occupancy, reboot unable to resolve special strings in file name-here with Aliyun file upload related port Usage-cannot continue processing other small files conversion work
To modify the code that connects OpenOffice , first connect the OpenOffice service already started, or reboot the new connection conversion service. (Code-level repair) does not support docx, the later version of the MS document. (add org component to solve this problem) does not support concurrent processing, does not support large file conversion
Here Read the source code found that can not be optimized, the use of components basically no source code, see also just decompile. On Windows and Linux on the OpenOffice performance is not the same, mainly is the conversion time, the file format, file name, file type resolution is not the same.