Delphi Extract PDF Text

Source: Internet
Author: User

There are a lot of controls to generate PDFs, but not too much to parse, PDF Toolkit can, but the first complex PDF of the test reports errors, and the characters are garbled, the version that may be used or the method used is incorrect.

Recall that before using Java called the Apache name of the PDFBox library is very useful, so it is used to download the PDFBox, using Delphi to invoke PDFBox parsing PDF text.

Environmental requirements: Java Runtime Environment

PDFBox App Package: Pdfbox-app-2.0.6.jar

This uses the DOS command line to parse, and then invokes the parse result.

The first is to execute the DOS command:

procedureCheckresult (B:boolean);begin  if  notB Then    RaiseException.Create(Syserrormessage (GetLastError));End;functionRundos (ConstCommandLine:string):string;varHread, Hwrite:thandle;  Startinfo:tstartupinfo;  Proceinfo:tprocessinformation;  B:boolean;  Sa:tsecurityattributes;  Ins:thandlestream; sret:tstrings;beginResult:="'; Fillchar (SA, sizeof (SA),0);//set allow inheritance, otherwise output results cannot be obtained under NT and 2000Sa.nlength: =sizeof (SA); Sa.binherithandle:=True; Sa.lpsecuritydescriptor:=Nil; B:= CreatePipe (Hread, Hwrite, @sa,0);  Checkresult (b); Fillchar (StartInfo, SizeOf (startinfo),0); STARTINFO.CB:=SizeOf (startinfo); Startinfo.wshowwindow:=Sw_hide;//a file handle that uses the specified handle as the standard input and output, using the specified display modeStartinfo.dwflags: = Startf_usestdhandlesorStartf_useshowwindow; Startinfo.hstderror:=Hwrite; Startinfo.hstdinput:= GetStdHandle (Std_input_handle);//Hread;Startinfo.hstdoutput: =Hwrite; B:= CreateProcess (Nil,//Lpapplicationname:pcharPChar (CommandLine),//Lpcommandline:pchar    Nil,//lpprocessattributes:psecurityattributes    Nil,//lpthreadattributes:psecurityattributesTrue,//Binherithandles:boolCreate_new_console,Nil,    Nil, StartInfo, Proceinfo);  Checkresult (b);  WaitForSingleObject (proceinfo.hprocess, INFINITE); InS:= Thandlestream.Create(Hread); ifIns.size >0  Then  beginSret:= Tstringlist.Create;    Sret.loadfromstream (InS); Result:=Sret.text;  Sret.free; End;  Ins.free;  CloseHandle (Hread); CloseHandle (hwrite);End;

The call is then displayed:

functionTfrmpdftool.getpdftext (sFile:string):string;varcmd:string; pdffilepath,pdffilename,txtfilename:string;begin  //Java-jar pdfbox-app-2.0.6.jar extracttext-encoding utf-8 e:\\temp\\test.pdf e:\\temp\\testiii.txtpdffilepath:=Extractfilepath (SFile); Pdffilename:=Extractfilename (SFile); Txtfilename:=fapppath+'temp\'+pdffilename+'. txt'; CMD:='Java-jar'+fapppath+'Pdfbox\pdfbox-app-2.0.6.jar Extracttext'+'-encoding Utf-8'+SFile+' '+Txtfilename;  Addlog (CMD); Result:=Rundos (CMD);  Addlog (Result); MemTxtFile.Lines.LoadFromFile (txtfilename,tutf8encoding.Create); Fpdftext:=Memtxtfile.text; Addlog (fpdftext);End;

OK, you are done!

Delphi Extract PDF Text

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.