Java Selenium tess4j identification verification code simulation landing.

Last Update:2018-07-19 Source: Internet

Author: User

Tags win32

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Use the tess4j reason is the verification code picture address by the JS dynamic click on the link state generation, and time stamps, resulting in the conventional crawler to obtain the image address download method of verification Code and Web page display, so think of the use of screenshots to intercept the verification code picture, using TESS4J to identify, To achieve the goal of automatic recognition, the recognition rate is a little low.

Version java1.8 tess4j 3.4 Selenium-java 3.6.0

1.eclipse New Java project, will download the tess4j after decompression, the Dist directory of the jar files copied to the Lib directory, and then the Lib directory, src directory, tessdata directory to the new Java project, Copy all jar files from the downloaded Selenium-java to Eclipse's new project.

2. Add all jar files to the build path, and under Windows, import the Win32-x86 (32-bit system) or win32-x86-64 (64-bit system) under the TESS4J decompression directory into the project. The Linux development environment needs to copy the TESSERACT-OCR so file to the corresponding directory, and the steps should be as follows:

Install Leptonica
wget http://www.leptonica.org/source/leptonica-1.71.tar.gz
TAR-ZXVF ...
./configure
Make
Make install
Install TESSERACT-OCR
wget http://tesseract-ocr.googlecode.com/files/tesseract-ocr-3.02.02.tar.gz
./autogen.sh
./configure
Make
Make install
Ldconfig
Reference: http://www.jianshu.com/p/51537319a467
TESSERACT-OCR has been configured to execute the following command: mainly to see where the Linux system so files are placed.
Cp/usr/local/lib/*.so.*/usr/lib64/centos
Cp/usr/local/lib/*.so.*/usr/lib/ubuntu
If not, download all so files and copy the files in the directory to the corresponding directory

3. The project environment configuration is complete, the key code is as follows:

Identify the simple authentication code function, the parameter is the picture path, the verification code consists of letters and numbers:

public string Identifycode (string path) {
		file ImageFile = new file (path);
		Itesseract instance = new Tesseract ();
		Image two value, increase the rate of recognition
		bufferedimage grayimage = null;
		try {
			grayimage = imagehelper.convertimagetobinary (Imageio.read (ImageFile));
		} catch (IOException E2) {

			E2.printstacktrace ();
		}
		try {
			imageio.write (grayimage, "PNG", New File (System.getproperty ("User.dir") + "/img", "vc1.png"));
		(IOException E1) {
			e1.printstacktrace ();
		}
		String path1 = System.getproperty ("User.dir") + "/img/vc1.png";

		File ImageFile1 = new file (path1);
		String result = null;
		try {result
			= INSTANCE.DOOCR (imageFile1);
		} catch (Tesseractexception E1) {
			e1.printstacktrace ();
		}
		Result=result.replaceall ("[^a-z^a-z^0-9]", "");  
		return result;
	}

Selenium Area screenshot Function:

	Public byte[] Takescreenshot (Webdriver driver) throws IOException {webdriver Augmenteddriver
		= new Augmenter (). Augment (driver);
		Return ((Takesscreenshot) augmenteddriver). Getscreenshotas (outputtype.bytes);
	}

	Public BufferedImage createelementimage (webdriver driver, webelement webelement,int-x,int y,int) width,int IOException {
		Dimension size = Webelement.getsize ();

		BufferedImage originalimage = Imageio.read (New Bytearrayinputstream (Takescreenshot (driver)));
		Point location = Webelement.getlocation ();
		BufferedImage croppedimage = Originalimage.getsubimage (Location.getx (), Location.gety (), Size.getWidth (), Size.getheight ())//linux Comment The following line of code, cancel this row and the previous line of comments, the correct screenshot
		bufferedimage croppedimage = originalimage.getsubimage (x, y , Size.getwidth () + width, size.getheight () + heigth); under//windows, browser screenshot is maximized, the x and Y coordinates can not be used directly, presumably related to screen resolution. return
		croppedimage;
	}

4. Browser and driver download directory, can go to http://www.seleniumhq.org/download/download,

Where the main browser downloads:
Firefox driver: https://github.com/mozilla/geckodriver/releases/
IE Driver: http://selenium-release.storage.googleapis.com/index.html
Google Drive: http://chromedriver.storage.googleapis.com/index.html (Google browser driver download address)
http://blog.csdn.net/huilan_same/article/details/51896672 (corresponding version)

The driver is placed in the browser installation directory, or the path is specified in the code:

Such as:

System.setproperty ("Webdriver.gecko.driver", DriverPath); Firefox Browser driver

System.setproperty ("Webdriver.chrome.driver", driverpath);//Google browser driven

Note that the driver version should correspond to the browser version.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More