If you use the office file format, you can study it. In addition, if you are a friend of the sharing software, you can consider the format conversion tool or class library. I found that a recent SourceForge project is doing this: http://b2xtranslator.sourceforge.net, please refer.
Office file (Doc, xls, PPT) format Official Website: www.microsoft.com/interop/docs/officebinaryformats.mspx
Microsoft Word
Word 97-2007 binary file format (.doc) specification PDF | XPS
Microsoft PowerPoint
Powerpoint 97-2007 binary file format (.ppt) specification PDF | XPS
Microsoft Excel
Excel 97-2007 binary file format (.xls) specification PDF | XPS
Excel 2007 binary file format (.xls B) specification PDF | XPS
Office Drawing
Office Drawing 97-2007 binary format specification PDF | XPS
Other file formats are also made public.
Windows compound binary file format specification PDF | XPS
Windows Metafile format (. WMF) specification PDF | XPS
Ink serialized format (ISF) specification PDF | XPS
In addition, here is a KB article dedicated to how to extract information from the office binary file the how to extract information from office files by using office file formats and schemas http://support.microsoft.com/kb/840817/en-us