MOFA flower in Nanjing City, /11/27
Requirement Description:
After reading a text document encoded in a UTF-8, convert the content into shift_jis encoding if you encounter
~ , ①, And other characters that do not belong to shift_jis. They are replaced by special symbols. If they are ignored, an error is returned.
Code:
Public static string checkissjis (string instring ){
// Create the encoder and decoder for the character encoding
Charset = charset. forname ("shift_jis ");
Charsetdecoder decoder = charset. newdecoder ();
Charsetencoder encoder = charset. newencoder ();
// Other methods used to control the manual action
// Force Error
// The force is equal to the value of 16 bytes when Unicode characters are specified. Onmalformedinput has been used before (3 cases)
Encoder. onmalformedinput (codingerroraction. Report );
//......
// Cannot be entered. there are two major scenarios.
// Onunmappablecharacter uses three cases)
// (1) when there is no limit to the number of words written into the force, there is no limit to the number of words written into the force.
// Encoder. onunmappablecharacter (codingerroraction. Ignore );
// ② Codingerroraction. when a report was reported, the system was interrupted, and the system was interrupted, and the exception was reported.
// Encoder. onunmappablecharacter (codingerroraction. Report );
// ③ Adjust the number of characters that are written into the force text. specify the number of characters that are inserted into the limit text (column) when the configuration is too large, the output is too large. Replacewith (byte []
// Newreplacement)
// Zookeeper has been set up when zookeeper has been installed. When there are too many other users 「?」 (= {(Byte )'? '}) Too many. Set the primary Primary Secondary
// Replacement has already been used to obtain the token.
Encoder. onunmappablecharacter (codingerroraction. Replace );
Encoder. replacewith ("☆". getbytes ());
String result = instring;
Try {
// Convert a string to bytes in a bytebuffer
Bytebuffer bbuf = encoder. encode (charbuffer. Wrap (instring ));
// Convert bytes in a bytebuffer to a character bytebuffer and then
// To a string.
Charbuffer cbuf = decoder. Decode (bbuf );
Result = cbuf. tostring ();
} Catch (charactercodingexception CCE ){
String errormessage = "exception during character encoding/decoding :"
+ CCE. getmessage ();
System. Out. println (errormessage );
CCE. printstacktrace ();
}
Return result;
}
This Code seems to have a bug, that is, when instring only contains one byte, such as instring = 'A', it seems that the returned result is null, instead of the expected 'A', I don't know if this is a JDK bug.
In the future, we will continue to add more information if any findings are found!
There is also a TXT file, for example, from ms932 and then saved as UTF-8. At this time, the content above the file turns into garbled characters. At this time, even if you delete all the content of the file, the file size is not 0 K,
It is 1 K. It seems that there is something hidden in this file.