With Java simple reading data in Word documents: Step 1: Download tm-extractors-0.4.jar: http://drc-dev.ohiolink.edu/browser/elated-core/trunk/lib/tm-extractors-0.4.jar? REV = 46 and put it under your classpath path. Step 2: A simple program. (in front of wordreader .java.pdf, you have a strust.doc file under your edrive. Import java. Io. file;
Import java. Io. fileinputstream; import org. textmining. Text. extraction. wordextractor; public class wordreader {
Public static string readdoc (string DOC) throws exception {
// Create an input stream to read the DOC file
Fileinputstream in = new fileinputstream (new file (DOC ));
Wordextractor extractor = NULL;
String text = NULL;
// Create a wordextractor
Extractor = new wordextractor ();
// Extract the DOC file
TEXT = extractor. extracttext (in );
Return text;
}/**
* @ Param ARGs
*/
Public static void main (string [] ARGs ){
// Todo auto-generated method stub
Try {
String text = wordreader. readdoc ("E:/strust.doc ");
System. Out. println (text );
} Catch (exception ex ){
Ex. printstacktrace ();
}
}
}
In this way, you can simply read data from the word. You can generate a TXT file.