Use the tess4j reason is the verification code picture address by the JS dynamic click on the link state generation, and time stamps, resulting in the conventional crawler to obtain the image address download method of verification Code and Web page display, so think of the use of screenshots to intercept the verification code picture, using TESS4J to identify, To achieve the goal of automatic recognition, the recognition rate is a little low.
Version java1.8 tess4j 3.4 Selenium-java 3.6.0
1.eclipse New Java project, will download the tess4j after decompression, the Dist directory of the jar files copied to the Lib directory, and then the Lib directory, src directory, tessdata directory to the new Java project, Copy all jar files from the downloaded Selenium-java to Eclipse's new project.
2. Add all jar files to the build path, and under Windows, import the Win32-x86 (32-bit system) or win32-x86-64 (64-bit system) under the TESS4J decompression directory into the project. The Linux development environment needs to copy the TESSERACT-OCR so file to the corresponding directory, and the steps should be as follows:
Install Leptonica
wget http://www.leptonica.org/source/leptonica-1.71.tar.gz
TAR-ZXVF ...
./configure
Make
Make install
Install TESSERACT-OCR
wget http://tesseract-ocr.googlecode.com/files/tesseract-ocr-3.02.02.tar.gz
./autogen.sh
./configure
Make
Make install
Ldconfig
Reference: http://www.jianshu.com/p/51537319a467
TESSERACT-OCR has been configured to execute the following command: mainly to see where the Linux system so files are placed.
Cp/usr/local/lib/*.so.*/usr/lib64/centos
Cp/usr/local/lib/*.so.*/usr/lib/ubuntu
If not, download all so files and copy the files in the directory to the corresponding directory
3. The project environment configuration is complete, the key code is as follows:
Identify the simple authentication code function, the parameter is the picture path, the verification code consists of letters and numbers:
public string Identifycode (string path) {
file ImageFile = new file (path);
Itesseract instance = new Tesseract ();
Image two value, increase the rate of recognition
bufferedimage grayimage = null;
try {
grayimage = imagehelper.convertimagetobinary (Imageio.read (ImageFile));
} catch (IOException E2) {
E2.printstacktrace ();
}
try {
imageio.write (grayimage, "PNG", New File (System.getproperty ("User.dir") + "/img", "vc1.png"));
(IOException E1) {
e1.printstacktrace ();
}
String path1 = System.getproperty ("User.dir") + "/img/vc1.png";
File ImageFile1 = new file (path1);
String result = null;
try {result
= INSTANCE.DOOCR (imageFile1);
} catch (Tesseractexception E1) {
e1.printstacktrace ();
}
Result=result.replaceall ("[^a-z^a-z^0-9]", "");
return result;
}
Selenium Area screenshot Function:
Public byte[] Takescreenshot (Webdriver driver) throws IOException {webdriver Augmenteddriver
= new Augmenter (). Augment (driver);
Return ((Takesscreenshot) augmenteddriver). Getscreenshotas (outputtype.bytes);
}
Public BufferedImage createelementimage (webdriver driver, webelement webelement,int-x,int y,int) width,int IOException {
Dimension size = Webelement.getsize ();
BufferedImage originalimage = Imageio.read (New Bytearrayinputstream (Takescreenshot (driver)));
Point location = Webelement.getlocation ();
BufferedImage croppedimage = Originalimage.getsubimage (Location.getx (), Location.gety (), Size.getWidth (), Size.getheight ())//linux Comment The following line of code, cancel this row and the previous line of comments, the correct screenshot
bufferedimage croppedimage = originalimage.getsubimage (x, y , Size.getwidth () + width, size.getheight () + heigth); under//windows, browser screenshot is maximized, the x and Y coordinates can not be used directly, presumably related to screen resolution. return
croppedimage;
}
4. Browser and driver download directory, can go to http://www.seleniumhq.org/download/download,
Where the main browser downloads:
Firefox driver: https://github.com/mozilla/geckodriver/releases/
IE Driver: http://selenium-release.storage.googleapis.com/index.html
Google Drive: http://chromedriver.storage.googleapis.com/index.html (Google browser driver download address)
http://blog.csdn.net/huilan_same/article/details/51896672 (corresponding version)
The driver is placed in the browser installation directory, or the path is specified in the code:
Such as:
System.setproperty ("Webdriver.gecko.driver", DriverPath); Firefox Browser driver
System.setproperty ("Webdriver.chrome.driver", driverpath);//Google browser driven
Note that the driver version should correspond to the browser version.