Recently I have been studying QR code recognition and used the zxing open source code, but the GBK type is always garbled. After two days of hard work, I finally solved the problem today. Remember ,.
I was developed on the basis of the Zxing-1.6, because zxing1.6 to the vertical screen to support better.
First, we need to build the environment for compiling core. jar, which I will not talk about. If not, you can refer
Http://yajin167.info/2011/07/01/integrated-zxing-scan-barcode.html
The main part is to change the source code.
Modify the core \ SRC \ com \ google \ zxing \ common stringutils. Java File
1. Add the following line in the Private Static final string iso88591 = "iso8859_1 ";
private static final String GBK = "GB2312";
2. Add the following in Boolean canbeutf8 = true;
boolean canBeGBK = true;
2. Add the value following int value = bytes [I] & 0xff;
// GBK stuff if (value> 0x7f) // if it is greater than 127, it may be gb2312, And the byte is determined, and the next byte {If (value> 0xb0 & value <= 0xf7) // The first byte again in this range, then start to judge the second self {int value2 = bytes [I + 1] & 0xff; If (value2> 0xa0 & value2 <= 0xf7) {canbegbk = true ;}}}
3. In if (canbeshiftjis & (maybedoublebytecount> = 3 | 20 * maybesinglebytekatakanacount> length )){
Return shift_jis;
} Add below
if(canBeGBK){return GBK;}
You can replace core. jar in ant production, which is basically based on Huludao,
I also uploaded a copy of my compiled core. jar on csdn, which is available directly. Address: http://download.csdn.net/detail/abowu/4547532
Finally, I put the complete stringutils. Java code up.
/** Copyright (c) 2010 zxing authors ** licensed under the Apache license, version 2.0 (the "License "); * You may not use this file before t in compliance with the license. * You may be obtain a copy of the license at ** http://www.apache.org/licenses/LICENSE-2.0 ** unless required by applicable law or agreed to in writing, software * distributed under the license is distributed on an "as is" basis, * Without warranties or conditions of any kind, either express or implied. * See the license for the specific language governing permissions and * limitations under the license. */package COM. google. zxing. common; import Java. util. hashtable; import COM. google. zxing. decodehinttype;/*** common string-related functions. ** @ author Sean Owen */public final class stringutils {Private Static final stri Ng platform_default_encoding = system. getproperty ("file. encoding "); public static final string shift_jis =" sjis "; Private Static final string euc_jp =" euc_jp "; Private Static final string utf8 =" utf8 "; private Static final string iso88591 = "iso8859_1"; Private Static final string GBK = "gb2312"; Private Static final Boolean assume_shift_jis = shift_jis.w.signorecase (platform_default_encodin G) | euc_jp.equalsignorecase (platform_default_encoding); Private stringutils () {}/ *** @ Param bytes encoding a string, whose encoding shoshould be guessed * @ Param hints decode hints if applicable * @ return name of guessed encoding; at the moment will only guess one of: * {@ link # shift_jis }, {@ link # utf8}, {@ link # iso88591}, or the platform * default encoding if none of these can possibly be Correct */public static string guessential coding (byte [] bytes, hashtable hints) {If (hints! = NULL) {string characterset = (string) hints. Get (decodehinttype. character_set); If (characterset! = NULL) {return characterset ;}// does it start with the UTF-8 byte order mark? Then guess it's UTF-8 if (bytes. length> 3 & bytes [0] = (byte) 0xef & bytes [1] = (byte) 0xbb & bytes [2] = (byte) 0xbf) {return utf8;} // for now, merely tries to distinguish ISO-8859-1, UTF-8 and shift_jis, // which shocould be by far the most common encodings. ISO-8859-1 // shocould not have bytes in the 0x80-0x9f range, while shift_jis // uses this as a first byte of a Two-byte character. If we see this // followed by a valid second byte in shift_jis, assume it is shift_jis. // if we see something else in that second byte, we'll make the risky guess // That it's UTF-8. int length = bytes. length; Boolean canbeiso88591 = true; Boolean canbeshiftjis = true; Boolean canbeutf8 = true; Boolean canbegbk = true; int utf8bytesleft = 0; int maybedoublebytecount = 0; int maybesinglebytekatakan Acount = 0; Boolean sawlatin1supplement = false; Boolean sawutf8start = false; Boolean lastwaspossibledoublebytestart = false; For (INT I = 0; I <length & (canbeiso88591 | canbeshiftjis | canbeutf8 | canbegbk); I ++) {int value = bytes [I] & 0xff; // GBK stuff if (value> 0x7f) // if it is greater than 127, it may be gb2312, And the byte is determined, and the next byte {If (value> 0xb0 & value <= 0xf7) // The first byte before this range, start to judge the second byte {int value2 = bytes [I + 1] & 0xff; If (value2> 0xa0 & value2 <= 0xf7) {canbegbk = true ;}}} // UTF-8 stuff if (value> = 0x80 & value <= 0xbf) {If (utf8bytesleft> 0) {utf8bytesleft --;} else {If (utf8bytesleft> 0) {canbeutf8 = false;} If (value >=0xc0 & value <= 0xfd) {sawutf8start = true; int valuecopy = value; while (valuecopy & 0x40 )! = 0) {utf8bytesleft ++; valuecopy <= 1 ;}}// ISO-8859-1 stuff if (value = 0xc2 | value = 0xc3) & I <length-1) {// This is really a poor hack. the slightly more exotic characters people might want to put in // a QR code, by which I mean the Latin-1 supplement characters (e.g. u-umlaut) have encodings // that start with 0xc2 followed by [0xa0, 0xbf], or start with 0xc3 followed by [0x80, 0xbf]. int nextvalue = bytes [I + 1] & 0xff; If (nextvalue <= 0xbf & (value = 0xc2 & nextvalue> = 0xa0) | (value = 0xc3 & nextvalue> = 0x80) {sawlatin1supplement = true ;}} if (value >=0x7f & value <= 0x9f) {canbeiso88591 = false;} // shift_jis stuff if (value >=0xa1 & value <= 0xdf) {// count the number of characters that might be a shift_jis single-byte katakana character if (! Lastwaspossibledoublebytestart) {maybesinglebytekatakanacount ++ ;}} if (! Lastwaspossibledoublebytestart & (value >=0xf0 & value <= 0xff) | value = 0x80 | value = 0xa0) {canbeshiftjis = false ;} if (value> = 0x81 & value <= 0x9f) | (value> = 0xe0 & value <= 0xef ))) {// These start double-byte characters in shift_jis. let's see if it's followed by a valid // second byte. if (lastwaspossibledoublebytestart) {// if we just checked this and the last byte Being a valid double-byte // char, don't check starting on this byte. if this and the last byte // formed a valid pair, then this shouldn't be checked to see if it starts // a double byte pair of course. lastwaspossibledoublebytestart = false;} else {//... otherwise do check to see if this plus the next byte form a valid // double byte pair encoding a character. lastwaspossibledoublebytestart = True; if (I> = bytes. length-1) {canbeshiftjis = false;} else {int nextvalue = bytes [I + 1] & 0xff; If (nextvalue <0x40 | nextvalue> 0xfc) {canbeshiftjis = false;} else {maybedoublebytecount ++;} // There is some conflicting information out there about which bytes can follow which in // double-byte shift_jis characters. the rule above seems to be the one that matches practice .}}} el Se {lastwaspossibledoublebytestart = false ;}} if (utf8bytesleft> 0) {canbeutf8 = false ;}// easy -- if assuming shift_jis and no evisponit can't be, done if (canbeshiftjis & assume_shift_jis) {return shift_jis;} If (canbeutf8 & sawutf8start) {return utf8;} // distinguishing shift_jis and ISO-8859-1 can be a little tough. the crude heuristic is: //-if we saw //-at least 3 bytes th At starts a double-byte value (bytes that are rare in ISO-8859-1), or //-over 5% of bytes cocould be single-byte katakana (also rare in ISO-8859-1 ), //-and, saw no sequences that are invalid in shift_jis, then we conclude shift_jis if (canbeshiftjis & (maybedoublebytecount >=3 | 20 * maybesinglebytekatakanacount> length )) {return shift_jis;} If (canbegbk) {return GBK;} // otherwise, we defaul T to ISO-8859-1 unless we know it can't be if (! Sawlatin1supplement & canbeiso88591) {return iso88591; // return gb2312 ;}// otherwise, we take a wild guess with platform encoding return platform_default_encoding ;}}
Next, let's compare the results.
The recognized QR code image is
Comparison chart before and after solution
Garbled