Task requirements:
- Extracting text from a PDF document
- Extracting pictures from a PDF document
the tools you need:
- Free spire.pdf for. NET 4.3 (freeware edition)
Implementation code:
"Example 1" extract text
using Spire.Pdf;using System;using System.IO;using System.Text;namespace ExtractText{ class Program { static void Main(string[] args) { //加载文档 PdfDocument document = new PdfDocument(); document.LoadFromFile("测试文档.pdf"); //实例化StringBuilder类,获取文本 StringBuilder content = new StringBuilder(); content.Append(document.Pages[0].ExtractText()); //保存提取后的文本内容到.txt文档 String fileName = "TextFromPDF.txt"; File.WriteAllText(fileName, content.ToString()); System.Diagnostics.Process.Start("TextFromPDF.txt"); } }}
Text Extract effect:
"Example 2" extract picture
Using system;using system.collections.generic;using system.text;using system.drawing;using Spire.Pdf;namespace Extractimagesfrompdf{class Program {static void Main (string[] args) {//Instantiates pdfdocument class, and Load Test document Pdfdocument doc = new pdfdocument (); Doc. LoadFromFile ("Test document. pdf"); Instantiate the List class list<image> listimage = new list<image> (); for (int i = 0; i < Doc. Pages.count; i++) {//Get Spire.Pdf.PdfPageBase Class object Pdfpagebase page = Doc. Pages[i]; Extract picture image[] Images = page. Extractimages (); if (images! = null && images. Length > 0) {listimage.addrange (images); }} if (Listimage.count > 0) {for (int i = 0; i < Listimage.count; i++) {Image image = Listimage[i]; Image. Save ("image" + (i + 1). ToString () + ". png", System.Drawing.Imaging.ImageFormat.Png); } System.Diagnostics.Process.Start ("Image1.png"); } } }}
Image extraction Effect:
C # extract PDF text and pictures