Java reads Word to HTM format with Jacob

Source: Internet
Author: User

Transferred from: http://blog.csdn.net/chinapi_hzh/article/details/5798689

Because Microsoft does not expose word source code, so directly with the Java stream to read Word's consequences is read out is all garbled. So you have to pass Jacob, the Middle Bridge. Of course, it can also be read by POI.
Let's talk about the method of reading with POI. With POI read, first to download Tm-extractors-0.4.jar Baidu can be found. The code is as follows:

1 ImportJava.io.FileInputStream;2 3 Try {4FileInputStream FileInputStream =NewFileInputStream (filepath);5Wordextractor Extractor =Newwordextractor ();6temp =Extractor.extracttext (fileinputstream);7System.out.println (temp + "==temp"));8 fileinputstream.close ();9}Catch(Exception ex) {TenSYSTEM.OUT.PRINTLN ("FileNotFoundException error" +ex.getmessage ()); One}


FilePath is the Word document path and returns a temp string. This read out is not garbled, but the effect is not satisfactory. Because of the word format to lose.
Again, with Jacob.    Go to the official website first to download: http://sourceforge.net/project/showfiles.php?group_id=109543&package_id=118368 jacob.zip. After download, unzip, put the Jacob.jar under the project/web-inf/lib. Put Jacob. dll under c:/windos/system32/and Java/jdk*.*/jre/bin. This is the configuration is complete. Say code:

1 Importcom.jacob.activeX.ActiveXComponent;2 ImportCom.jacob.com.Dispatch;3 Importcom.jacob.com.Variant;4  Public BooleanChageformat (String folderpath,string FileName) {5String FileFormat = "";6 System.out.println (folderpath);7FileFormat = filename.substring (Filename.length ()-4, Filename.length ());8 System.out.println (FileFormat);9      if(Fileformat.equalsignorecase (". Doc"))Ten      { OneString docfile = FolderPath + "//" +FileName; ASystem.out.println ("Word File path:" +docfile); -          //full path to Word file -String htmlfile = docfile.substring (0, (Docfile.length ()-4) + ". htm"; theSystem.out.println ("HTM file path:" +htmlfile); -          //full path to HTML file -Activexcomponent app =NewActivexcomponent ("Word.Application"); -          //Start Word +          Try -          { +App.setproperty ("Visible",NewVariant (false)); A            //To set a non-visual run of a Word program atDispatch docs = app.getproperty ("Documents")). Todispatch (); -Dispatch doc = Dispatch.invoke (docs, "Open", Dispatch.method,NewObject[]{docfile,NewVariant (false),NewVariant (true)},New int[1]). Todispatch ();  -            //Open Word file -Dispatch.invoke (Doc, "SaveAs", Dispatch.method,NewObject[]{htmlfile,NewVariant (8)},New int[1]); -            //saving files as an HTM format -Dispatch.call (Doc, "Close",NewVariant (false)); in            //Close File -          } to          Catch(Exception e) +          { - e.printstacktrace (); the          } *          finally $          {Panax NotoginsengApp.invoke ("Quit",Newvariant[] {}); -            //quit Word Program the          } +          //Conversion Complete A          return true; the      } +      return false; -}


FolderPath stores the path for Word. FileName is the word name. Use this method to turn Word files into an HTM file. This time you can use the stream to read the HTM file, read out is not garbled. And with a format.
It is also emphasized that Jacob has a relationship with the JDK and the version of Windows. So the version must be matched. Otherwise you will get an error. Version of the problem is going to try it all. There is no short cut to go.

Java reads Word to HTM format with Jacob

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.