In a text-to-text document, we can extract the textual information or pictures in the document as needed, and we can extract the text and images from the word and PDF files by C # code, and then we will also be able to extract the text and images from the ppt slides. This document will describe how to use C # to extract ppt text and pictures. The first is also the need to install the component Spire.presentation, and then add the reference DLL file to the project. The following are the main code steps.
Original document:
1. Extracting text
Step One: Create a presentation instance and load the document
New Presentation (@ "C:\Users\Administrator\Desktop\sample.pptx", fileformat.pptx2010);
Step two: Create a StringBuilder object
New StringBuilder ();
Step three: Traverse the slides and the graphics on the slide to extract the text content
foreach (islide slide in presentation. Slides) { foreach (ishape shape in slide. Shapes) { if (Shape is Iautoshape) { foreach (textparagraph tp in (Shape as Iautoshape). textframe.paragraphs) {sb. Append (TP. Text + Environment.NewLine); } } } }
Step four: Write TXT document
File.writealltext ("target.txt", sb.) ToString ()); Process.Start ("target.txt");
2. Extract Pictures
There are two cases of extracting pictures here, one is to extract all the pictures in the entire document, and the other is to extract only the pictures from a particular slide in the document.
2.1 Extract All pictures
Step One: Initialize an instance of the presentation class and load the document
New Presentation (); ppt. LoadFromFile (@ "C:\Users\Administrator\Desktop\sample.pptx");
Step two: Traverse the picture in the document, extract the picture and save
for (int0; i < ppt.) Images.count; i++) { = ppt. Images[i]. Image; Image. Save (string. Format (@ ": \.. \images{0}.png", I));}
The extracted picture has been saved to the project folder
2.2. extracting pictures from a specific slide
Step One: Create an instance of the presentation class and load the document
New Presentation (); Ppt. LoadFromFile (@ "C:\Users\Administrator\Desktop\sample.pptx");
Step two: Get the third slide, extract and save the picture
inti =0;foreach(IShape sinchPpt. slides[2]. Shapes) {if(s isslidepicture) {slidepicture PS= S asslidepicture; Ps. PictureFill.Picture.EmbedImage.Image.Save (string. Format ("{0}.png", i)); I++; } if(s isPictureshape) {Pictureshape PS= S asPictureshape; Ps. EmbedImage.Image.Save (string. Format ("{0}.png", i)); I++; }}
The picture from the third slide you extracted is saved to the specified location
The above shows how to extract text and pictures, the steps are simple and practical, I hope to help you, thank you for reading!
Please specify the source if you want to reprint.
C # extract PPT text and picture implementation scheme