Android solves the problem of garbled QR codes detected by Zxing to identify GBK types (effect chart comparison)

Source: Internet
Author: User

Recently I have been studying QR code recognition and used the Zxing open source code, but the GBK type is always garbled. After two days of hard work, I finally solved the problem today. Remember ,.

I was developed on the basis of the Zxing-1.6, because zxing1.6 to the vertical screen to support better.

First, we need to build the environment for compiling core. jar, which I will not talk about. If not, you can refer

 

The main part is to change the source code.

Modify the core \ src \ com \ google \ zxing \ common StringUtils. java File

1. Add the following line in the private static final String ISO88591 = "ISO8859_1 ";

[Html]
Private static final String GBK = "GB2312 ";
2. Add the following in boolean canBeUTF8 = true;

[Html]
Boolean canBeGBK = true;

2. Add the value following int value = bytes [I] & 0xFF;

[Html]
// GBK stuff
If (value> 0x7F) // if it is greater than 127, it may be GB2312. Then, start to judge this byte and the next byte.
{
If (value> 0xB0 & value <= 0xF7) // The first byte is within this range, then the second byte is determined.
{
Int value2 = bytes [I + 1] & 0xFF;
If (value2> 0xA0 & value2 <= 0xF7)
{
CanBeGBK = true;
}
}
}

3. In if (canBeShiftJIS & (maybeDoubleByteCount> = 3 | 20 * maybeSingleByteKatakanaCount> length )){
Return SHIFT_JIS;
} Add below

[Html]
If (canBeGBK ){
Return GBK;
}

You can replace core. jar in ant production, which is basically based on Huludao,

Finally, I put the complete StringUtils. java code up.

[Html]
/*
* Copyright (C) 2010 ZXing authors
*
* Licensed under the Apache License, Version 2.0 (the "License ");
* You may not use this file before t in compliance with the License.
* You may obtain a copy of the License
*
** Unless required by applicable law or agreed to in writing, software
* Distributed under the License is distributed on an "as is" BASIS,
* Without warranties or conditions of any kind, either express or implied.
* See the License for the specific language governing permissions and
* Limitations under the License.
*/
 
Package com. google. zxing. common;
 
Import java. util. Hashtable;
 
 
Import com. google. zxing. DecodeHintType;
 
/**
* Common string-related functions.
*
* @ Author Sean Owen
*/
Public final class StringUtils {
 
Private static final String PLATFORM_DEFAULT_ENCODING =
System. getProperty ("file. encoding ");
Public static final String SHIFT_JIS = "SJIS ";
Private static final String EUC_JP = "EUC_JP ";
Private static final String UTF8 = "UTF8 ";
Private static final String ISO88591 = "ISO8859_1 ";
Private static final String GBK = "GB2312 ";
Private static final boolean ASSUME_SHIFT_JIS =
SHIFT_JIS.equalsIgnoreCase (PLATFORM_DEFAULT_ENCODING) |
EUC_JP.equalsIgnoreCase (PLATFORM_DEFAULT_ENCODING );
 
Private StringUtils (){}
 
/**
* @ Param bytes encoding a string, whose encoding shoshould be guessed
* @ Param hints decode hints if applicable
* @ Return name of guessed encoding; at the moment will only guess one:
* {@ Link # SHIFT_JIS}, {@ link # UTF8}, {@ link # ISO88591}, or the platform
* Default encoding if none of these can possibly be correct
*/
Public static String guessential coding (byte [] bytes, Hashtable hints ){
If (hints! = Null ){
String characterSet = (String) hints. get (DecodeHintType. CHARACTER_SET );
If (characterSet! = Null ){
Return characterSet;
}
}

 
// Does it start with the UTF-8 byte order mark? Then guess it's UTF-8
If (bytes. length> 3 &&
Bytes [0] = (byte) 0xEF &&
Bytes [1] = (byte) 0xBB &&
Bytes [2] = (byte) 0xBF ){
Return UTF8;
}
// For now, merely tries to distinguish ISO-8859-1, UTF-8 and Shift_JIS,
// Which shocould be by far the most common encodings. ISO-8859-1
// Shocould not have bytes in the 0x80-0x9F range, while Shift_JIS
// Uses this as a first byte of a two-byte character. If we see this
// Followed by a valid second byte in Shift_JIS, assume it is Shift_JIS.
// If we see something else in that second byte, we'll make the risky guess
// That it's UTF-8.
Int length = bytes. length;
Boolean canBeISO88591 = true;
Boolean canBeShiftJIS = true;
Boolean canBeUTF8 = true;
Boolean canBeGBK = true;
Int utf8BytesLeft = 0;
Int maybeDoubleByteCount = 0;
Int maybeSingleByteKatakanaCount = 0;
Boolean sawLatin1Supplement = false;
Boolean sawUTF8Start = false;
Boolean lastWasPossibleDoubleByteStart = false;
 
For (int I = 0;
I <length & (canBeISO88591 | canBeShiftJIS | canBeUTF8 | canBeGBK );
I ++ ){
 
Int value = bytes [I] & 0xFF;
 
// GBK stuff
If (value> 0x7F) // if it is greater than 127, it may be GB2312. Then, start to judge this byte and the next byte.
{
If (value> 0xB0 & value <= 0xF7) // The first byte is within this range, then the second byte is determined.
{
Int value2 = bytes [I + 1] & 0xFF;
If (value2> 0xA0 & value2 <= 0xF7)
{
CanBeGBK = true;
}
}
}
// UTF-8 stuff
If (value> = 0x80 & value <= 0xBF ){
If (utf8BytesLeft> 0 ){
Utf8BytesLeft --;
}
} Else {
If (utf8BytesLeft> 0 ){
CanBeUTF8 = false;
}
If (value> = 0xC0 & value <= 0xFD ){
SawUTF8Start = true;
Int valueCopy = value;
While (valueCopy & 0x40 )! = 0 ){
Utf8BytesLeft ++;
ValueCopy <= 1;
}
}
}
 
// ISO-8859-1 stuff
 
If (value = 0xC2 | value = 0xC3) & I <length-1 ){
// This is really a poor hack. The slightly more exotic characters people might want to put in
// A QR Code, by which I mean the Latin-1 supplement characters (e.g. u-umlaut) have encodings
// That start with 0xC2 followed by [0xA0, 0xBF], or start with 0xC3 followed by [0x80, 0xBF].
Int nextValue = bytes [I + 1] & 0xFF;
If (nextValue <= 0xBF &&
(Value = 0xC2 & nextValue> = 0xA0) | (value = 0xC3 & nextValue> = 0x80 ))){
SawLatin1Supplement = true;
}
}
If (value >=0x7f & value <= 0x9F ){
CanBeISO88591 = false;
}
 


// Shift_JIS stuff
 
If (value >=0xa1 & value <= 0xDF ){
// Count the number of characters that might be a Shift_JIS single-byte Katakana character
If (! LastWasPossibleDoubleByteStart ){
MaybeSingleByteKatakanaCount ++;
}
}
If (! LastWasPossibleDoubleByteStart &&
(Value> = 0xF0 & value <= 0xFF) | value = 0x80 | value = 0xA0 )){
CanBeShiftJIS = false;
}
If (value> = 0x81 & value <= 0x9F) | (value> = 0xE0 & value <= 0xEF ))){
// These start double-byte characters in Shift_JIS. Let's see if it's followed by a valid
// Second byte.
If (lastWasPossibleDoubleByteStart ){
// If we just checked this and the last byte for being a valid double-byte
// Char, don't check starting on this byte. If this and the last byte
// Formed a valid pair, then this shouldn't be checked to see if it starts
// A double byte pair of course.
LastWasPossibleDoubleByteStart = false;
} Else {
//... Otherwise do check to see if this plus the next byte form a valid
// Double byte pair encoding a character.
LastWasPossibleDoubleByteStart = true;
If (I> = bytes. length-1 ){
CanBeShiftJIS = false;
} Else {
Int nextValue = bytes [I + 1] & 0xFF;
If (nextValue <0x40 | nextValue> 0xFC ){
CanBeShiftJIS = false;
} Else {
MaybeDoubleByteCount ++;
}
// There is some conflicting information out there about which bytes can follow which in
// Double-byte Shift_JIS characters. The rule above seems to be the one that matches practice.
}
}
} Else {
LastWasPossibleDoubleByteStart = false;
}
}
If (utf8BytesLeft> 0 ){
CanBeUTF8 = false;
}
 
// Easy -- if assuming Shift_JIS and no evisponit can't be, done
If (canBeShiftJIS & ASSUME_SHIFT_JIS ){
Return SHIFT_JIS;
}
If (canBeUTF8 & sawUTF8Start ){
Return UTF8;
}
// Distinguishing Shift_JIS and ISO-8859-1 can be a little tough. The crude heuristic is:
//-If we saw
//-At least 3 bytes that starts a double-byte value (bytes that are rare in ISO-8859-1), or
//-Over 5% of bytes cocould be single-byte Katakana (also rare in ISO-8859-1 ),
//-And, saw no sequences that are invalid in Shift_JIS, then we conclude Shift_JIS
If (canBeShiftJIS & (maybeDoubleByteCount> = 3 | 20 * maybeSingleByteKatakanaCount> length )){
Return SHIFT_JIS;
}
If (canBeGBK ){
Return GBK;
}
// Otherwise, we default to ISO-8859-1 unless we know it can't be
If (! SawLatin1Supplement & canBeISO88591 ){
Return ISO88591;
// Return GB2312;
}
// Otherwise, we take a wild guess with platform encoding
Return PLATFORM_DEFAULT_ENCODING;
}
 
}

Next, let's compare the results.

The recognized QR code image is

 

Comparison chart before and after solution

Garbled

Garbled

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.