By using Java POI to extract tabular information from a word (1992) document, where POI supports different MS document types, you need to be aware of the specific operations. This article is mainly through POI to extract the form information in Microsoft 2003 documents, the specific code is as follows (need to import the POI jar package beforehand):
public static void TestWord2 () {try {fileinputstream in = new FileInputStream ("July 2005 1.doc");//Load Document//fileinputstream in = new FileInputStream ("2003.doc");//Load document Poifsfilesystem PFS = new Poifsfilesystem (in); Hwpfdocument HWPF = new Hwpfdocument (PFS); Range range = Hwpf.getrange ();//Get the document read range tableiterator it = new Tableiterator (range); FileWriter FileWriter = new FileWriter (New File ("Result.txt"));//The table in the iteration document while (It.hasnext ()) {Table TB = (table) it.next ();//Iteration line, default starting from 0 if (tb.numrows () >0) {TableRow tr = tb.getrow (0);//Iteration column, default starting from 0 if (Tr.numcells () ==2) {TableCell TD1 = Tr.getcell (0);//Get cell TableCell td2 = Tr.getcell (1);//Get cell//Get cell contents String str1 = Td1.text (). Trim (); String str2 = Td2.text (). Trim (), if (str2!=null&&! "". Equals (STR2) &&str2.contains ("[21][11]")) {System.out.println (str1); Filewriter.write (str2+ "\ n");}} else if (Tr.numcells () ==3) {TableCell TD2 = Tr.getcell (1); String str2 = Td2.text (). Trim (); System.out.println ("str2=" +str2); Filewriter.write (str2+ "\ n");}} // End for}//End Whilefilewriter.close ();} catch (Exception e) {e.printstacktrace ();}}
The code above simply tests the table information in the POI extraction Word document and calls the method directly.
Use Java POI to choose to extract table information from a Word document