There are two types of file character sets in Windows: ANSI and Unicode.
For Unicode, Windows supports three of its encoding methods: Small Tail encoding (UNICODE), big tail encoding (bigendianunicode), and UTF-8 encoding.
We can identify the encoding of a file from its header. When the first two bytes of the header are FF Fe, the Unicode tail encoding is used. When the first two bytes are Fe ff, the Unicode tail encoding is used; when the two bytes in the header are ef bb, It is the Unicode UTF-8 encoding; when it is not, it is the ANSI encoding.
As mentioned above, we can judge the file encoding format by reading two bytes in the file header. The Code is as follows (C # code ):
In the program, system. Text. encoding. Default indicates the encoding of the current ANSI code page of the operating system.
Public Shared Function GetFileEncodeType(ByVal stream As Stream) As Encoding Dim buffer() As Byte Using br As New BinaryReader(stream) buffer = br.ReadBytes(2) br.Close() End Using If buffer(0) >= &HEF Then If buffer(0) = &HEF AndAlso buffer(1) = &HBB Then Return Encoding.UTF8 ElseIf buffer(0) = &HFE AndAlso buffer(1) = &HFF Then Return Encoding.BigEndianUnicode ElseIf buffer(0) = &HFF AndAlso buffer(1) = &HFE Then Return Encoding.Unicode Else Return Encoding.Default End If Else Return Encoding.Default End IfEnd Function