JAVA Validation Recognition: Training samples based on Jtessboxeditorfx and TESSERACT-OCR
Tool Preparation:
Jtessboxeditorfx Download:Https://github.com/nguyenq/jTessBoxEditorFX
TESSERACT-OCR Download:https://sourceforge.net/projects/tesseract-ocr/
Main steps:
- Jtessboxeditorfx,tesseract-ocr(environment variable configuration) download,jar Package preparation (maven, See Pom file below )
- Download verification code to local (code)
- Convert CAPTCHA Picture format
- De-noising The converted verification code, cutting edges (code)
- use Jtessboxeditorfx for . Box file proofing (Correcting identification error verification code): https://www.cnblogs.com/ Zhongtang/p/5555950.html
- use the tesseract command line to generate the . Traineddata , and then call it in java : https:// Www.cnblogs.com/zhongtang/p/5555950.html
The code is as follows:
PackageYanzhengmatest.pikachu;ImportJava.awt.image.BufferedImage;ImportJava.io.BufferedInputStream;ImportJava.io.File;Importjava.io.FileNotFoundException;ImportJava.io.FileOutputStream;Importjava.io.IOException;Importjava.net.MalformedURLException;ImportJava.net.URL;ImportJavax.imageio.ImageIO;Importjavax.net.ssl.HttpsURLConnection;ImportOrg.opencv.core.Core;ImportOrg.opencv.core.CvType;ImportOrg.opencv.core.Mat;ImportOrg.opencv.core.Rect;Importorg.opencv.core.Size;ImportOrg.opencv.imgcodecs.Imgcodecs;ImportOrg.opencv.imgproc.Imgproc;Importnet.sourceforge.tess4j.Tesseract;Importnet.sourceforge.tess4j.TesseractException; Public classTest {Static{system.loadlibrary (core.native_library_name); }; //used to invoke the OpenCV library file, you must add Public Static voidMain (string[] args)throwsFileNotFoundException, IOException, interruptedexception {//folder where the verification code is savedFile Imgfile =NewFile ("C:\\users\\pc\\desktop\\formpic\\unformpic"); //Verification Code Save addressString downaddress = "c:\\users\\pc\\desktop\\formpic\\unformpic\\"; //Verification CodeString Downurl = "https://www.qichamao.com/usercenter/varifyimage?t=0.6488481170232967"; if(Imgfile.listfiles (). length < 400) { for(inti = 1; I <= 400; i++) {downloadpic (Downurl, downaddress+ i + ". gif"); Thread.Sleep (Ten + (i% 100)); } } //get saved verification code and convert to TIF format (Tesseract does not support GIF image recognition)File IMGFILE0 =NewFile ("C:\\users\\pc\\desktop\\formpic\\unformpic"); for(File image:imgFile0.listFiles ()) {Changepicformat ("TIF", Image, "c:\\users\\pc\\desktop\\formpic\\formedpic\\"); } System.out.println ("Picture format conversion succeeded"); //obtain verification code converted to TIF format, and process (image denoising, binary), increase verification code recognition degree intPicnum = 1; File ImageFile1=NewFile ("C:\\users\\pc\\desktop\\formpic\\formedpic"); for(File image:imageFile1.listFiles ()) {Filterpic (Image.getname (), Picnum+ ". tif"); Picnum++; } //get the processedFile Resultimgs =NewFile ("C:\\users\\pc\\desktop\\result_cut"); for(File link:resultImgs.listFiles ()) {String Reslut=getresult (link); System.out.println (Link.getname ()+ "Recognition Result:" +Reslut); } } //Picture processing and image storage after processing Public Static voidFilterpic (String imgname, String fileName)throwsFileNotFoundException, IOException {//picture de-noisingMat src = imgcodecs.imread ("c:\\users\\pc\\desktop\\formpic\\formedpic\\" +imgname, imgcodecs.imread_unchanged); Mat DST=NewMat (Src.width (), Src.height (), CVTYPE.CV_8UC1); if(Src.empty ()) {System.out.println ("No Pictures"); } Else{System.out.println ("Image processing Success"); } imgproc.boxfilter (src, DST, src.depth (),NewSize (3.2, 3.2)); Imgcodecs.imwrite ("C:\\users\\pc\\desktop\\filter\\" +FileName, DST); //image threshold processing, binary valueMat Src1 = Imgcodecs.imread ("c:\\users\\pc\\desktop\\filter\\" +FileName, imgcodecs.imread_unchanged); Mat Dst1=NewMat (Src1.width (), Src1.height (), CVTYPE.CV_8UC1); Imgproc.threshold (Src1, Dst1,165, 200, Imgproc.thresh_trunc); Imgcodecs.imwrite ("C:\\users\\pc\\desktop\\process\\" +FileName, Dst1); //Image CaptureMat src2 = Imgcodecs.imread ("c:\\users\\pc\\desktop\\process\\" +FileName, imgcodecs.imread_unchanged); Rect ROI=NewRect (4, 2, Src2.cols ()-7, Src2.rows ()-4);//parameters: x-coordinate, y-coordinate, intercept length, intercept widthMat Dst2 =NewMat (SRC2, ROI); Imgcodecs.imwrite ("C:\\users\\pc\\desktop\\result_cut\\" +FileName, Dst2); } //Get Verification Code Public StaticString GetResult (File imagefile) {if(!imagefile.exists ()) {System.out.println ("Picture does not exist"); } tesseract tessreact=Newtesseract (); Tessreact.setdatapath ("F:\\program Files (x86) \\Tesseract-OCR\\tessdata"); Tessreact.setlanguage ("Fontyp");//Set the default library as a library of your own trainingString result; Try{result=TESSREACT.DOOCR (ImageFile); returnresult; } Catch(tesseractexception e) {e.printstacktrace (); return NULL; } } /*** Image Format conversion * *@paramOutputFormat * Format of conversion *@paramfile * image to convert *@paramDownaddress * The saved address after conversion * @sourse:http://www.open-open.com/code/view/1453300186683 */ Public Static voidChangepicformat (String outputformat, File image, String downaddress) {Try{bufferedimage Bim=Imageio.read (image); File Output=NewFile (downaddress+ image.getname (). substring (0, Image.getname (). LastIndexOf (".") + 1) +OutputFormat); Imageio.write (BIM, OutputFormat, Output); } Catch(IOException e) {e.printstacktrace (); } } /*** Download Verification code * *@paramPicurl * Verification code get address *@paramaddress * Picture Save location*/ Public Static voiddownloadpic (String picurl, String imgaddress) {Try{URL URL=NewURL (Picurl); Httpsurlconnection Conn=(httpsurlconnection) url.openconnection (); //need to set the header information, otherwise it will be recognized as the machine and not get the captcha pictureConn.setrequestproperty ("User-agent", "Mozilla/5.0 (Windows NT 10.0; WOW64) applewebkit/537.36 (khtml, like Gecko) chrome/68.0.3440.75 safari/537.36 "); Conn.connect (); intresult =-1; byte[] buf =New byte[1024]; Bufferedinputstream bis=NewBufferedinputstream (Conn.getinputstream ()); FileOutputStream Fos=NewFileOutputStream (imgaddress); while(Result = Bis.read (BUF))! =-1) {fos.write (BUF); } fos.flush (); Fos.close (); Bis.close (); System.out.println ("Image Download Successful"); } Catch(malformedurlexception e) {System.out.println ("Picture Read failed"); E.printstacktrace (); } Catch(IOException e) {System.out.println (); E.printstacktrace (); } }}
Pom file:
<dependency> <groupId>net.sourceforge.tess4j</groupId> <artifactid>tess4j</ artifactid> <version>4.1.1</version> <exclusions> <exclusion> < groupid>com.sun.jna</groupid> <artifactId>jna</artifactId> </exclusion> </exclusions> </dependency> <dependency> <groupid>org.openpnp</ groupid> <artifactId>opencv</artifactId> <version>3.2.0-0</version> </dependency>
Reference article:
Use of OPENSV: https://blog.csdn.net/u012706811/article/details/52779271OPENSV Tutorial: https://www.w3cschool.cn/opencv/opencv-me9i28vh.htmlOPENSV Two value: https://blog.csdn.net/liyuqian199695/article/details/53925046OPENSV's maven address: https://mvnrepository.com/artifact/org.openpnp/opencv/3.4.2-0OPENSV image filter: https://blog.csdn.net/u012393192/article/details/78528550OPENSV Picture pruning: https://blog.csdn.net/sileixinhua/article/details/72811093OPENSV case with tesserate command: https://www.cnblogs.com/zhongtang/p/5555950.htmlattached text: https://blog.csdn.net/lmj623565791/article/details/23960391
Exception handling:
1. Loading Library Exceptions :
Exception in thread "main" Java.lang.UnsatisfiedLinkError:no opencv_java320in Java.library.path Atjava.lang.ClassLoader.loadLibrary (classloader.java:1867) atjava.lang.Runtime.loadLibrary0 (runtime.java:870) Atjava.lang.System.loadLibrary (system.java:1122) atyanzhengmatest.pikachu.test.<clinit> (test.java:38)
Solve:
Set the path of the slice location to: G:\Program Files (x86) \APACHE-MAVEN\REPO\ORG\OPENPNP\OPENCV\3.2.0-0\OPENCV-3.2.0-0\NU\PATTERN\OPENCV \WINDOWS\X86_64 (specified according to your MAVEN OPENCV package address).
2. JDK and OPENCV versions do not match (Exception in thread "main" Java.lang.UnsatisfiedLinkError:no Jniopencv_highgui in Java.library.path)
Workaround: Replace the OPENCV version
3. An exception occurred when generating the. tr file using the command line:
Page 4061 dpi. Using269Error during processing.
Resolution: May be image Conversion format or download error, the picture can be replaced
Java captcha recognition: Training samples based on Jtessboxeditorfx and TESSERACT-OCR