Text files usually identify their encoding methods through the first two bytes, but UTF-32 encoding uses the first four bytes to identify their encoding methods. The following are some encoding format identifiers:
Encoding Method |
First few bytes |
ANSI |
No format definition |
Unicode |
FF fe |
Unicode big endian |
Fe FF |
UTF-8 |
EF bb |
UTF-16/UCS-2, little endian |
Fe FF |
UTF-16, UCS-2, big endian |
FF fe |
UTF-32/UCS-4, little endian |
FF Fe 00 00 |
UTF-32, UCS-4, big-Endian |
00 00 Fe FF |
In this way, we writeCodeYou only need to read the first two bytes of the file ~ 4 bytes to know the encoding method. However, in. net, there is another simpler way to know the encoding method of text files and use the following code:
Public encoding getencoding (string file)
{
VaR r = new streamreader (file, true); // true indicatesProgramAutomatic file encoding
Return R. currentencoding; // return code
}