Java reads Word converted to HTM format through Jacob

Source: Internet
Author: User

Because Microsoft does not disclose word source code, so the direct use of Java stream to read Word's consequences is read out of all garbled. So we have to pass Jacob, the Middle Bridge. Of course, you can also use POI to read.
First of all, read the method with POI. Read with POI, first to download Tm-extractors-0.4.jar Baidu can be found. The code is as follows:
Import Org.textmining.text.extraction.WordExtractor;
try {
FileInputStream FileInputStream = new FileInputStream (
filepath);
Wordextractor extractor = new Wordextractor ();
Temp =extractor.extracttext (FileInputStream);
System.out.println (temp+ "==temp");
Fileinputstream.close ();
catch (Exception ex) {
SYSTEM.OUT.PRINTLN ("FileNotFoundException error" +
Ex.getmessage ());
}
FilePath is the Word document path, and returns a temp string. This read out is not garbled, but the effect is not satisfactory. Because the word format was discarded.
Again, with Jacob.    Go to the official website to download first: http://sourceforge.net/project/showfiles.php?group_id=109543&package_id=118368 jacob.zip. After the download decompression, put the Jacob.jar under the project/web-inf/lib. Put the Jacob. dll under c:/windos/system32/and Java/jdk*.*/jre/bin. This means that the configuration is complete. Say code:
Import com.jacob.activeX.ActiveXComponent;
Import Com.jacob.com.Dispatch;
Import com.jacob.com.Variant;
public boolean Chageformat (String folderpath,string FileName) {
String FileFormat = "";
System.out.println (FolderPath);
FileFormat = filename.substring (Filename.length () -4,filename.length ());
System.out.println (FileFormat);
if (Fileformat.equalsignorecase (". Doc"))
{
String docfile = FolderPath + "//" + FileName;
System.out.println ("Word File path:" +docfile);
Full path to Word file
String htmlfile = docfile.substring (0, (Docfile.length ()-4)) + ". htm";
System.out.println ("HTM file path:" +htmlfile);
Full path to HTML file
Activexcomponent app = new Activexcomponent ("Word.Application");
Start Word
Try
{
App.setproperty ("Visible", new Variant (false));
Set word programs to run non-visual
Dispatch docs = app.getproperty ("Documents"). Todispatch ();
Dispatch doc = Dispatch.invoke (docs, "Open", Dispatch.method, New Object[]{docfile,new variant (FALSE), new variant (TRUE) }, new Int[1]). Todispatch ();
Open a Word file
Dispatch.invoke (Doc, "SaveAs", Dispatch.method, New Object[]{htmlfile,new Variant (8)}, new int[1]);
Save files as HTM format
Dispatch.call (Doc, "Close", new Variant (false));
Close File
}
catch (Exception e)
{
E.printstacktrace ();
}
Finally
{
App.invoke ("Quit", new variant[] {});
Quit Word Program
}
Conversion completed
return true;
}
return false;
}
FolderPath stores the path for Word. FileName is the word name. This method is used to convert the Word file into an HTM file. This time you can use the stream to read the HTM file, read out of neither garbled. and comes with a format.
The other thing to emphasize is that the components of Jacob are related to the JDK and the Windows version. So the version must match. Otherwise it will be an error. Version of the problem is going to try each. There is no shortcut to go.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.