Using Java to parse out the correct text from garbled text _java

Source: Internet
Author: User

This is a program that can get the correct original text from the garbled text, based on the principle that the wrong encoding often leads to bit replenishment, so the correct text uses a minimum number of bytes (one).


Copy Code code as follows:

Package com.hongyuan.test;

Import java.io.UnsupportedEncodingException;

/*
* This is a code from garbled text to get the correct original text of the program, which is based on the principle that the wrong coding often leads to a bit of supplemental,
* So the correct text should have the least number of bytes (one).
*
* If you are not able to get the correct text when you test this program, the possible causes are as follows:
* 1. This program can only get the original text from one error encoded text and cannot recover the text from multiple error encodings.
* 2. Sometimes the wrong encoding causes some characters to become invisible, and you may not copy all the garbled text, leading to a bit deletion. In this case, the text cannot be recovered.
* 3. The original text is a relatively large character set, the wrong encoding uses a small character set, those characters outside the small character set are lost and cannot parse the correct text from it.
* 4. Congratulations on your winning, there are some characters that use any kind of encoding no different or wrong coding does not lead to a bit supplement, then I can do nothing. (This is really rare)
*
* Note: The program garbled text is Baidu Home (utf-8) adjusted for GBK (obviously garbled) get, interested comrades can use other garbled test. There are questions welcome reply.
*/
public class Charsettest {

public static final string[] Charset_names=new string[]{"iso8859-1", "GBK", "UTF-8"};

public static void Main (string[] args) throws Unsupportedencodingexception {
Garbled string
String str= "Atlas  蒋鐧惧 harm 鍏 帹 Windows XP 鑱 斿 悎 闃 fork 姢 nowshera e 喅 file";

int strlength=integer.max_value; Character length
String newstr= ""; A string parsed from a garbled string
String srccharset= ""; Current garbled string encoding
String targetcharset= ""; Correct encoding of garbled string

Traversing possible combinations of encodings, resulting in a coded format with the smallest encoding length
for (int i=0;i<charset_names.length;i++) {
for (int j=0;j<charset_names.length;j++) {
String Temp=new string (Str.getbytes (Charset_names[i]), charset_names[j]);
SYSTEM.OUT.PRINTLN (temp);
if (Temp.length () <=strlength) {
Strlength=temp.length ();
Newstr=temp;
Srccharset=charset_names[i];
TARGETCHARSET=CHARSET_NAMES[J];
}
}
}
Output query to the encoding and correct text format
System.out.println (srccharset+ "-->" +targetcharset+ ":" +newstr);
}

}

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.