The UTF-8 is converted to shift_jis to recognize non-shift_jis characters, such (~ , ①, Etc)

Source: Internet
Author: User

MOFA flower in Nanjing City, /11/27

Requirement Description:
After reading a text document encoded in a UTF-8, convert the content into shift_jis encoding if you encounter
~ , ①, And other characters that do not belong to shift_jis. They are replaced by special symbols. If they are ignored, an error is returned.

Code:

Public static string checkissjis (string instring ){
// Create the encoder and decoder for the character encoding
Charset = charset. forname ("shift_jis ");
Charsetdecoder decoder = charset. newdecoder ();
Charsetencoder encoder = charset. newencoder ();
// Other methods used to control the manual action
// Force Error
// The force is equal to the value of 16 bytes when Unicode characters are specified. Onmalformedinput has been used before (3 cases)
Encoder. onmalformedinput (codingerroraction. Report );
//......
// Cannot be entered. there are two major scenarios.
// Onunmappablecharacter uses three cases)
// (1) when there is no limit to the number of words written into the force, there is no limit to the number of words written into the force.
// Encoder. onunmappablecharacter (codingerroraction. Ignore );
// ② Codingerroraction. when a report was reported, the system was interrupted, and the system was interrupted, and the exception was reported.
// Encoder. onunmappablecharacter (codingerroraction. Report );
// ③ Adjust the number of characters that are written into the force text. specify the number of characters that are inserted into the limit text (column) when the configuration is too large, the output is too large. Replacewith (byte []
// Newreplacement)
// Zookeeper has been set up when zookeeper has been installed. When there are too many other users 「?」 (= {(Byte )'? '}) Too many. Set the primary Primary Secondary
// Replacement has already been used to obtain the token.
Encoder. onunmappablecharacter (codingerroraction. Replace );
Encoder. replacewith ("☆". getbytes ());
String result = instring;
Try {
// Convert a string to bytes in a bytebuffer
Bytebuffer bbuf = encoder. encode (charbuffer. Wrap (instring ));
// Convert bytes in a bytebuffer to a character bytebuffer and then
// To a string.
Charbuffer cbuf = decoder. Decode (bbuf );
Result = cbuf. tostring ();
} Catch (charactercodingexception CCE ){
String errormessage = "exception during character encoding/decoding :"
+ CCE. getmessage ();
System. Out. println (errormessage );
CCE. printstacktrace ();
}
Return result;
}

This Code seems to have a bug, that is, when instring only contains one byte, such as instring = 'A', it seems that the returned result is null, instead of the expected 'A', I don't know if this is a JDK bug.
In the future, we will continue to add more information if any findings are found!

There is also a TXT file, for example, from ms932 and then saved as UTF-8. At this time, the content above the file turns into garbled characters. At this time, even if you delete all the content of the file, the file size is not 0 K,
It is 1 K. It seems that there is something hidden in this file.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.