In the near future with a project dealing with docx documents, the guide package is inevitable, the following is due to the project contact, can process the docx tool (including but not only the following). Through a brief introduction to the tool and a personal insight into it, the link to Maven repository is given:
1.Apache POI
Poi is the first tool I found, perhaps preconceived, poi is the most information I can find on the web about working with Office documents, but personally it seems that poi is more inclined to work with Excel documents, and it has less support for Word documents. POI online Documentation: http://poi.apache.org/apidocs/index.html
2.Aspose Words
Reference Aspose words is because when writing a set of code to deal with the docx format document, do not want to write to the DOC format document again, so it is necessary to format the DOC format conversion work, the query aspose words to achieve this function, and then try to Found defects: First in the writing period of this article Aspose words Toolkit or testing phase, the doc to docx has a word or what restrictions, anyway, after the format conversion generated in the Docx format document content and source document is not the same; second, during the writing period of this article, it is said that the toolkit is chargeable, So consider a friend who wants to be a commercial, or wait for it to open up (a little unrealistic).
3.JAVA COM Bridge
For short, Jacob, can perfectly complete the task of doc to docx, surprise found that in fact, Jacob also support more format conversion, such as Excel to Pdf,word to PPT, and so on, each format is represented by a constant, call is very simple, strongly recommended, But its shortcomings are also fatal: The Linux environment cannot run (T t).
4.Apache poi xwpf Converter Core + Apache POI xwpf Converter XHTML
These two toolkits are also Apache out, enabling this combination is to realize the requirements of the docx-to-HTML file, specific applications can be consulted: Https://github.com/jeckeyLiu/word2Html/blob/master/src/main /java/com/abc/word2html/util/word2html.java, on the network can find "more" with core+xhtml implementation of docx to HTML code, but debugging will occur Nosuchmethod exception, I have not solved the problem so far ...
5.docx4j
This really is to deal with the docx artifact, it is directly hit the docx nature of the tool, the Office document is actually the bottom of the XML, so docx4j is actually parsing XML, the support of Docx strong, powerful recommendation
6.itext
No use, only heard can also be processed, listed here as a future backup information
Above, there is a new update
Combat Experience 1--java third-party tools needed to process docx