There are a lot of controls to generate PDFs, but not too much to parse, PDF Toolkit can, but the first complex PDF of the test reports errors, and the characters are garbled, the version that may be used or the method used is incorrect.
Recall that before using Java called the Apache name of the PDFBox library is very useful, so it is used to download the PDFBox, using Delphi to invoke PDFBox parsing PDF text.
Environmental requirements: Java Runtime Environment
PDFBox App Package: Pdfbox-app-2.0.6.jar
This uses the DOS command line to parse, and then invokes the parse result.
The first is to execute the DOS command:
procedureCheckresult (B:boolean);begin if notB Then RaiseException.Create(Syserrormessage (GetLastError));End;functionRundos (ConstCommandLine:string):string;varHread, Hwrite:thandle; Startinfo:tstartupinfo; Proceinfo:tprocessinformation; B:boolean; Sa:tsecurityattributes; Ins:thandlestream; sret:tstrings;beginResult:="'; Fillchar (SA, sizeof (SA),0);//set allow inheritance, otherwise output results cannot be obtained under NT and 2000Sa.nlength: =sizeof (SA); Sa.binherithandle:=True; Sa.lpsecuritydescriptor:=Nil; B:= CreatePipe (Hread, Hwrite, @sa,0); Checkresult (b); Fillchar (StartInfo, SizeOf (startinfo),0); STARTINFO.CB:=SizeOf (startinfo); Startinfo.wshowwindow:=Sw_hide;//a file handle that uses the specified handle as the standard input and output, using the specified display modeStartinfo.dwflags: = Startf_usestdhandlesorStartf_useshowwindow; Startinfo.hstderror:=Hwrite; Startinfo.hstdinput:= GetStdHandle (Std_input_handle);//Hread;Startinfo.hstdoutput: =Hwrite; B:= CreateProcess (Nil,//Lpapplicationname:pcharPChar (CommandLine),//Lpcommandline:pchar Nil,//lpprocessattributes:psecurityattributes Nil,//lpthreadattributes:psecurityattributesTrue,//Binherithandles:boolCreate_new_console,Nil, Nil, StartInfo, Proceinfo); Checkresult (b); WaitForSingleObject (proceinfo.hprocess, INFINITE); InS:= Thandlestream.Create(Hread); ifIns.size >0 Then beginSret:= Tstringlist.Create; Sret.loadfromstream (InS); Result:=Sret.text; Sret.free; End; Ins.free; CloseHandle (Hread); CloseHandle (hwrite);End;
The call is then displayed:
functionTfrmpdftool.getpdftext (sFile:string):string;varcmd:string; pdffilepath,pdffilename,txtfilename:string;begin //Java-jar pdfbox-app-2.0.6.jar extracttext-encoding utf-8 e:\\temp\\test.pdf e:\\temp\\testiii.txtpdffilepath:=Extractfilepath (SFile); Pdffilename:=Extractfilename (SFile); Txtfilename:=fapppath+'temp\'+pdffilename+'. txt'; CMD:='Java-jar'+fapppath+'Pdfbox\pdfbox-app-2.0.6.jar Extracttext'+'-encoding Utf-8'+SFile+' '+Txtfilename; Addlog (CMD); Result:=Rundos (CMD); Addlog (Result); MemTxtFile.Lines.LoadFromFile (txtfilename,tutf8encoding.Create); Fpdftext:=Memtxtfile.text; Addlog (fpdftext);End;
OK, you are done!
Delphi Extract PDF Text