Java uses Jacob to read and convert Word to HTM format

Source: Internet
Author: User

Because Microsoft does not disclose the source code of the word, the consequence of Directly Reading the word using the Java stream is that all the source code is garbled. So you must use Jacob's intermediate bridge. Of course, it can also be read using poi.
Let's talk about how to use poi to read data. With poi read, first download tm-extractors-0.4.jar Baidu can be found. The Code is as follows:
Import org. textmining. Text. extraction. wordextractor;
Try {
Fileinputstream = new fileinputstream (
Filepath );
Wordextractor extractor = new wordextractor ();
Temp = extractor. extracttext (fileinputstream );
System. Out. println (temp + "= Temp ");
Fileinputstream. Close ();
} Catch (exception ex ){
System. Out. println ("filenotfoundexception error" +
Ex. getmessage ());
}
Filepath is the path of a Word document, and a temp string is returned. The read results are not garbled, but the results are not satisfactory. Because the Word format is lost.
Again with Jacob. Go to the official website to download: http://sourceforge.net/project/showfiles.php? Group_id = 109543 & package_id = 118368 Jacob. Zip. After downloading the file, decompress it and put Jacob. Jar under project/Web-INF/lib. Put Jacob. dll under C:/WINDOS/system32/and Java/JDK *. */JRE/bin. In this way, the configuration is complete. Code:
Import com. Jacob. ActiveX. activexcomponent;
Import com.jacb.com. Dispatch;
Import com.jacb.com. Variant;
Public Boolean chageformat (string folderpath, string filename ){
String fileformat = "";
System. Out. println (folderpath );
Fileformat = filename. substring (filename. Length ()-4, filename. Length ());
System. Out. println (fileformat );
If (fileformat. equalsignorecase (". Doc "))
{
String docfile = folderpath + "//" + filename;
System. Out. println ("Word file path:" + docfile );
// Full path of the Word file
String htmlfile = docfile. substring (0, (docfile. Length ()-4) + ". htm ";
System. Out. println ("HTM file path:" + htmlfile );
// Complete path of the HTML file
Activexcomponent APP = new activexcomponent ("word. application ");
// Start WORD
Try
{
App. setproperty ("visible", new variant (false ));
// Set the word program to run in non-visual mode
Dispatch docs = app. getproperty ("events"). todispatch ();
Dispatch Doc = dispatch. invoke (Docs, "open", dispatch. method, new object [] {docfile, new variant (false), new variant (true)}, new int [1]). todispatch ();
// Open the Word file
Dispatch. Invoke (Doc, "saveas", dispatch. method, new object [] {htmlfile, new variant (8)}, new int [1]);
// Save the file as an HTM format
Dispatch. Call (Doc, "close", new variant (false ));
// Close the file
}
Catch (exception E)
{
E. printstacktrace ();
}
Finally
{
App. Invoke ("quit", new variant [] {});
// Exit the word Program
}
// The conversion is complete.
Return true;
}
Return false;
}
Folderpath is the word storage path. Filename is the word name. This method is used to convert a Word file into an HTM file. At this time, you can use a stream to read the HTM file. The read is not garbled. It is in the format.
In addition, it should be emphasized that the components of Jacob are related to JDK and Windows versions. Therefore, the version must match. Otherwise, an error is reported. You have to try the version one by one. There are no shortcuts.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.