Java Selenium tess4j identification verification code simulation landing.

Source: Internet
Author: User
Tags win32

Use the tess4j reason is the verification code picture address by the JS dynamic click on the link state generation, and time stamps, resulting in the conventional crawler to obtain the image address download method of verification Code and Web page display, so think of the use of screenshots to intercept the verification code picture, using TESS4J to identify, To achieve the goal of automatic recognition, the recognition rate is a little low.

Version java1.8 tess4j 3.4 Selenium-java 3.6.0

1.eclipse New Java project, will download the tess4j after decompression, the Dist directory of the jar files copied to the Lib directory, and then the Lib directory, src directory, tessdata directory to the new Java project, Copy all jar files from the downloaded Selenium-java to Eclipse's new project.

2. Add all jar files to the build path, and under Windows, import the Win32-x86 (32-bit system) or win32-x86-64 (64-bit system) under the TESS4J decompression directory into the project. The Linux development environment needs to copy the TESSERACT-OCR so file to the corresponding directory, and the steps should be as follows:

Install Leptonica
wget http://www.leptonica.org/source/leptonica-1.71.tar.gz
TAR-ZXVF ...
./configure
Make
Make install
Install TESSERACT-OCR
wget http://tesseract-ocr.googlecode.com/files/tesseract-ocr-3.02.02.tar.gz
./autogen.sh
./configure
Make
Make install
Ldconfig
Reference: http://www.jianshu.com/p/51537319a467
TESSERACT-OCR has been configured to execute the following command: mainly to see where the Linux system so files are placed.
Cp/usr/local/lib/*.so.*/usr/lib64/centos
Cp/usr/local/lib/*.so.*/usr/lib/ubuntu
If not, download all so files and copy the files in the directory to the corresponding directory

3. The project environment configuration is complete, the key code is as follows:

Identify the simple authentication code function, the parameter is the picture path, the verification code consists of letters and numbers:

public string Identifycode (string path) {
		file ImageFile = new file (path);
		Itesseract instance = new Tesseract ();
		Image two value, increase the rate of recognition
		bufferedimage grayimage = null;
		try {
			grayimage = imagehelper.convertimagetobinary (Imageio.read (ImageFile));
		} catch (IOException E2) {

			E2.printstacktrace ();
		}
		try {
			imageio.write (grayimage, "PNG", New File (System.getproperty ("User.dir") + "/img", "vc1.png"));
		(IOException E1) {
			e1.printstacktrace ();
		}
		String path1 = System.getproperty ("User.dir") + "/img/vc1.png";

		File ImageFile1 = new file (path1);
		String result = null;
		try {result
			= INSTANCE.DOOCR (imageFile1);
		} catch (Tesseractexception E1) {
			e1.printstacktrace ();
		}
		Result=result.replaceall ("[^a-z^a-z^0-9]", "");  
		return result;
	}

Selenium Area screenshot Function:

	Public byte[] Takescreenshot (Webdriver driver) throws IOException {webdriver Augmenteddriver
		= new Augmenter (). Augment (driver);
		Return ((Takesscreenshot) augmenteddriver). Getscreenshotas (outputtype.bytes);
	}

	Public BufferedImage createelementimage (webdriver driver, webelement webelement,int-x,int y,int) width,int IOException {
		Dimension size = Webelement.getsize ();

		BufferedImage originalimage = Imageio.read (New Bytearrayinputstream (Takescreenshot (driver)));
		Point location = Webelement.getlocation ();
		BufferedImage croppedimage = Originalimage.getsubimage (Location.getx (), Location.gety (), Size.getWidth (), Size.getheight ())//linux Comment The following line of code, cancel this row and the previous line of comments, the correct screenshot
		bufferedimage croppedimage = originalimage.getsubimage (x, y , Size.getwidth () + width, size.getheight () + heigth); under//windows, browser screenshot is maximized, the x and Y coordinates can not be used directly, presumably related to screen resolution. return
		croppedimage;
	}
4. Browser and driver download directory, can go to http://www.seleniumhq.org/download/download,

Where the main browser downloads:
Firefox driver: https://github.com/mozilla/geckodriver/releases/
IE Driver: http://selenium-release.storage.googleapis.com/index.html
Google Drive: http://chromedriver.storage.googleapis.com/index.html (Google browser driver download address)
http://blog.csdn.net/huilan_same/article/details/51896672 (corresponding version)


The driver is placed in the browser installation directory, or the path is specified in the code:

Such as:

System.setproperty ("Webdriver.gecko.driver", DriverPath); Firefox Browser driver

System.setproperty ("Webdriver.chrome.driver", driverpath);//Google browser driven

Note that the driver version should correspond to the browser version.



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.