Microsfot office format Analysis
1. Format Classification
Composite document(Office2003/2000/97)
Ooxml document(Office open XML, offline 2012/2010/2007)
2. parse open-source packages
Poi
Apache poi, which uses the Java language, can be parsed in both formats.
Official Website: http://poi.apache.org/index.html
Npoi
Npoi is A. Net version of poi.
Http://npoi.codeplex.com/
Docx4j
Docx4j is a Java library for creating and manipulating Microsoft open XML (Word docx, PowerPoint pptx, and Excel XLSX) files
The language is Java
Official Website: http://www.docx4java.org)
3. References:
Http://blog.csdn.net/jkingcl/article/details/4544898
Http://chenhailong.iteye.com/blog/1498528
Http://www.iteye.com/topic/420319
1. Understand the Office binary file format: http://msdn.microsoft.com/zh-cn/library/gg615407 (V = office.14). aspx
2, understand the word MS-DOC binary file format: http://msdn.microsoft.com/zh-CN/library/gg615596
3. Understand the PowerPoint MS-PPT binary file format: http://msdn.microsoft.com/zh-CN/library/gg615594
4. Understand the graphics in the format of office binary files: http://msdn.microsoft.com/zh-CN/library/gg985447
5. Search for graphics in the binary PowerPoint MS-PPT file: http://msdn.microsoft.com/zh-CN/library/hh244173
6. http://www.cnblogs.com/mayswind/archive/2013/03/31/2991271.html