Conversion between Doc PDF ppt and txt:
The function of a component is generally to read the file into a character format, and is not simply a conversion file name suffix, so you need to read something to write to the TXT file.
Add Office Reference
When you program Word and PPT in office in. NET, make sure that you have the WORD,PPT programmable components installed when you install Office (which you can view when you customize your installation) or that you install Microsoft Office 2003 Primary Interop Assemblies "
After installation, add references on the programming page:
Add Reference-com-microsoft PowerPoint Object 11.0 libaray/word 11.0 Object Library;
You also have to add Office components
Using Microsoft.Office.Interop.Word;
Using Microsoft.Office.Interop.PowerPoint;
Using Org.pdfbox.pdmodel;
Using Org.pdfbox.util;
Using Microsoft.Office.Interop.Word;
Using Microsoft.Office.Interop.PowerPoint;
Publicvoid pdf2txt (FileInfo file,fileinfo txtfile)
{
PDDocument Doc =pddocument.load (file. FullName);
Pdftextstripper pdfstripper =newpdftextstripper ();
string text = Pdfstripper.gettext (DOC);
StreamWriter swpdfchange =newstreamwriter (txtfile. Fullname,false,encoding.getencoding ("gb2312"));
Swpdfchange.write (text);
Swpdfchange.close ();
}
For a table in a doc file, the result of the read is that the grid line is removed and the content is read by row.
public void Word2text (FileInfo file,fileinfo txtfile)
{
Object ReadOnly =true;
Object missing = System.Reflection.Missing.Value;
Object fileName = file. FullName;
Microsoft.Office.Interop.Word.ApplicationClass WordApp =new Microsoft.Office.Interop.Word.ApplicationClass ();
Document doc = WordApp. Documents.Open (ref fileName,
Ref missing,ref Readonly,ref Missing, ref missing,ref missing,
Ref missing,ref Missing,ref Missing, ref missing,ref missing,
Ref missing,ref Missing,ref Missing, ref missing,ref missing);
string text = Doc. Content.text;
Doc. Close (ref missing,ref missing,ref missing);
WordApp. Quit (ref missing,ref missing,ref missing);
StreamWriter swwordchange =new StreamWriter (txtfile. Fullname,false,encoding.getencoding ("gb2312"));
Swwordchange.write (text);
Swwordchange.close ();
}
public void Ppt2txt (FileInfo file, FileInfo txtfile)
{
Microsoft.Office.Interop.PowerPoint.Application pa =new Microsoft.Office.Interop.PowerPoint.ApplicationClass ();
Microsoft.Office.Interop.PowerPoint.Presentation PP = Pa. Presentations.Open (file. FullName,
Microsoft.Office.Core.MsoTriState.msoTrue,
Microsoft.Office.Core.MsoTriState.msoFalse,
Microsoft.Office.Core.MsoTriState.msoFalse);
string pps = "";
StreamWriter swpptchange =new StreamWriter (txtfile. Fullname,false,encoding.getencoding ("gb2312"));
foreach (Microsoft.Office.Interop.PowerPoint.Slide Slidein pp. Slides)
{
foreach (Microsoft.Office.Interop.PowerPoint.Shape shapein slide. Shapes)
PPS + = shape. TextFrame.TextRange.Text.ToString ();
}
Swpptchange.write (PPS);
Swpptchange.close ();
}
Read different types of files
Public StreamReader text2reader (FileInfo file)
{
StreamReader St =null;
Switch (file. Extension.tolower ())
{
Case ". txt":
st = new StreamReader (file. Fullname,encoding.getencoding ("gb2312"));
Break
Case ". Doc":
FileInfo wordfile =new FileInfo (@ "e:/my programs/200807program/filesearch/app_data/word2txt.txt");//cannot use relative path, try to improve
Word2text (file, wordfile);
st = Newstreamreader (wordfile. Fullname,encoding.getencoding ("gb2312"));
Break
Case ". pdf":
FileInfo pdffile =new FileInfo (@ "e:/my programs/200807program/filesearch/app_data/pdf2txt.txt");
Pdf2txt (file, pdffile);
st = new StreamReader (pdffile. Fullname,encoding.getencoding ("gb2312"));
Break
Case ". ppt":
FileInfo pptfile =new FileInfo (@ "e:/my programs/200807program/filesearch/app_data/ppt2txt.txt");
Ppt2txt (File,pptfile);
st = new StreamReader (pptfile. Fullname,encoding.getencoding ("gb2312"));
Break
}
Return St;
}
C # Read Doc,pdf,ppt,txt file