Recently I have been studying QR code recognition and used the Zxing open source code, but the GBK type is always garbled. After two days of hard work, I finally solved the problem today. Remember ,.
I was developed on the basis of the Zxing-1.6, because zxing1.6 to the vertical screen to support better.
First, we need to build the environment for compiling core. jar, which I will not talk about. If not, you can refer
The main part is to change the source code.
Modify the core \ src \ com \ google \ zxing \ common StringUtils. java File
1. Add the following line in the private static final String ISO88591 = "ISO8859_1 ";
[Html]
Private static final String GBK = "GB2312 ";
2. Add the following in boolean canBeUTF8 = true;
[Html]
Boolean canBeGBK = true;
2. Add the value following int value = bytes [I] & 0xFF;
[Html]
// GBK stuff
If (value> 0x7F) // if it is greater than 127, it may be GB2312. Then, start to judge this byte and the next byte.
{
If (value> 0xB0 & value <= 0xF7) // The first byte is within this range, then the second byte is determined.
{
Int value2 = bytes [I + 1] & 0xFF;
If (value2> 0xA0 & value2 <= 0xF7)
{
CanBeGBK = true;
}
}
}
3. In if (canBeShiftJIS & (maybeDoubleByteCount> = 3 | 20 * maybeSingleByteKatakanaCount> length )){
Return SHIFT_JIS;
} Add below
[Html]
If (canBeGBK ){
Return GBK;
}
You can replace core. jar in ant production, which is basically based on Huludao,
Finally, I put the complete StringUtils. java code up.
[Html]
/*
* Copyright (C) 2010 ZXing authors
*
* Licensed under the Apache License, Version 2.0 (the "License ");
* You may not use this file before t in compliance with the License.
* You may obtain a copy of the License
*
** Unless required by applicable law or agreed to in writing, software
* Distributed under the License is distributed on an "as is" BASIS,
* Without warranties or conditions of any kind, either express or implied.
* See the License for the specific language governing permissions and
* Limitations under the License.
*/
Package com. google. zxing. common;
Import java. util. Hashtable;
Import com. google. zxing. DecodeHintType;
/**
* Common string-related functions.
*
* @ Author Sean Owen
*/
Public final class StringUtils {
Private static final String PLATFORM_DEFAULT_ENCODING =
System. getProperty ("file. encoding ");
Public static final String SHIFT_JIS = "SJIS ";
Private static final String EUC_JP = "EUC_JP ";
Private static final String UTF8 = "UTF8 ";
Private static final String ISO88591 = "ISO8859_1 ";
Private static final String GBK = "GB2312 ";
Private static final boolean ASSUME_SHIFT_JIS =
SHIFT_JIS.equalsIgnoreCase (PLATFORM_DEFAULT_ENCODING) |
EUC_JP.equalsIgnoreCase (PLATFORM_DEFAULT_ENCODING );
Private StringUtils (){}
/**
* @ Param bytes encoding a string, whose encoding shoshould be guessed
* @ Param hints decode hints if applicable
* @ Return name of guessed encoding; at the moment will only guess one:
* {@ Link # SHIFT_JIS}, {@ link # UTF8}, {@ link # ISO88591}, or the platform
* Default encoding if none of these can possibly be correct
*/
Public static String guessential coding (byte [] bytes, Hashtable hints ){
If (hints! = Null ){
String characterSet = (String) hints. get (DecodeHintType. CHARACTER_SET );
If (characterSet! = Null ){
Return characterSet;
}
}
// Does it start with the UTF-8 byte order mark? Then guess it's UTF-8
If (bytes. length> 3 &&
Bytes [0] = (byte) 0xEF &&
Bytes [1] = (byte) 0xBB &&
Bytes [2] = (byte) 0xBF ){
Return UTF8;
}
// For now, merely tries to distinguish ISO-8859-1, UTF-8 and Shift_JIS,
// Which shocould be by far the most common encodings. ISO-8859-1
// Shocould not have bytes in the 0x80-0x9F range, while Shift_JIS
// Uses this as a first byte of a two-byte character. If we see this
// Followed by a valid second byte in Shift_JIS, assume it is Shift_JIS.
// If we see something else in that second byte, we'll make the risky guess
// That it's UTF-8.
Int length = bytes. length;
Boolean canBeISO88591 = true;
Boolean canBeShiftJIS = true;
Boolean canBeUTF8 = true;
Boolean canBeGBK = true;
Int utf8BytesLeft = 0;
Int maybeDoubleByteCount = 0;
Int maybeSingleByteKatakanaCount = 0;
Boolean sawLatin1Supplement = false;
Boolean sawUTF8Start = false;
Boolean lastWasPossibleDoubleByteStart = false;
For (int I = 0;
I <length & (canBeISO88591 | canBeShiftJIS | canBeUTF8 | canBeGBK );
I ++ ){
Int value = bytes [I] & 0xFF;
// GBK stuff
If (value> 0x7F) // if it is greater than 127, it may be GB2312. Then, start to judge this byte and the next byte.
{
If (value> 0xB0 & value <= 0xF7) // The first byte is within this range, then the second byte is determined.
{
Int value2 = bytes [I + 1] & 0xFF;
If (value2> 0xA0 & value2 <= 0xF7)
{
CanBeGBK = true;
}
}
}
// UTF-8 stuff
If (value> = 0x80 & value <= 0xBF ){
If (utf8BytesLeft> 0 ){
Utf8BytesLeft --;
}
} Else {
If (utf8BytesLeft> 0 ){
CanBeUTF8 = false;
}
If (value> = 0xC0 & value <= 0xFD ){
SawUTF8Start = true;
Int valueCopy = value;
While (valueCopy & 0x40 )! = 0 ){
Utf8BytesLeft ++;
ValueCopy <= 1;
}
}
}
// ISO-8859-1 stuff
If (value = 0xC2 | value = 0xC3) & I <length-1 ){
// This is really a poor hack. The slightly more exotic characters people might want to put in
// A QR Code, by which I mean the Latin-1 supplement characters (e.g. u-umlaut) have encodings
// That start with 0xC2 followed by [0xA0, 0xBF], or start with 0xC3 followed by [0x80, 0xBF].
Int nextValue = bytes [I + 1] & 0xFF;
If (nextValue <= 0xBF &&
(Value = 0xC2 & nextValue> = 0xA0) | (value = 0xC3 & nextValue> = 0x80 ))){
SawLatin1Supplement = true;
}
}
If (value >=0x7f & value <= 0x9F ){
CanBeISO88591 = false;
}
// Shift_JIS stuff
If (value >=0xa1 & value <= 0xDF ){
// Count the number of characters that might be a Shift_JIS single-byte Katakana character
If (! LastWasPossibleDoubleByteStart ){
MaybeSingleByteKatakanaCount ++;
}
}
If (! LastWasPossibleDoubleByteStart &&
(Value> = 0xF0 & value <= 0xFF) | value = 0x80 | value = 0xA0 )){
CanBeShiftJIS = false;
}
If (value> = 0x81 & value <= 0x9F) | (value> = 0xE0 & value <= 0xEF ))){
// These start double-byte characters in Shift_JIS. Let's see if it's followed by a valid
// Second byte.
If (lastWasPossibleDoubleByteStart ){
// If we just checked this and the last byte for being a valid double-byte
// Char, don't check starting on this byte. If this and the last byte
// Formed a valid pair, then this shouldn't be checked to see if it starts
// A double byte pair of course.
LastWasPossibleDoubleByteStart = false;
} Else {
//... Otherwise do check to see if this plus the next byte form a valid
// Double byte pair encoding a character.
LastWasPossibleDoubleByteStart = true;
If (I> = bytes. length-1 ){
CanBeShiftJIS = false;
} Else {
Int nextValue = bytes [I + 1] & 0xFF;
If (nextValue <0x40 | nextValue> 0xFC ){
CanBeShiftJIS = false;
} Else {
MaybeDoubleByteCount ++;
}
// There is some conflicting information out there about which bytes can follow which in
// Double-byte Shift_JIS characters. The rule above seems to be the one that matches practice.
}
}
} Else {
LastWasPossibleDoubleByteStart = false;
}
}
If (utf8BytesLeft> 0 ){
CanBeUTF8 = false;
}
// Easy -- if assuming Shift_JIS and no evisponit can't be, done
If (canBeShiftJIS & ASSUME_SHIFT_JIS ){
Return SHIFT_JIS;
}
If (canBeUTF8 & sawUTF8Start ){
Return UTF8;
}
// Distinguishing Shift_JIS and ISO-8859-1 can be a little tough. The crude heuristic is:
//-If we saw
//-At least 3 bytes that starts a double-byte value (bytes that are rare in ISO-8859-1), or
//-Over 5% of bytes cocould be single-byte Katakana (also rare in ISO-8859-1 ),
//-And, saw no sequences that are invalid in Shift_JIS, then we conclude Shift_JIS
If (canBeShiftJIS & (maybeDoubleByteCount> = 3 | 20 * maybeSingleByteKatakanaCount> length )){
Return SHIFT_JIS;
}
If (canBeGBK ){
Return GBK;
}
// Otherwise, we default to ISO-8859-1 unless we know it can't be
If (! SawLatin1Supplement & canBeISO88591 ){
Return ISO88591;
// Return GB2312;
}
// Otherwise, we take a wild guess with platform encoding
Return PLATFORM_DEFAULT_ENCODING;
}
}
Next, let's compare the results.
The recognized QR code image is
Comparison chart before and after solution
Garbled
Garbled