Under the Windows platform, a text file was stored in UTF-8 encoded format using the system's Notepad, but because the team at Microsoft Development Notepad used a very bizarre behavior to save UTF-8 encoded files, They're smart enough to add 0XEFBBBF (hex) characters at the beginning of each file, so we're going to have a lot of incredible problems, like the first line of the page might show a "? , obviously the correct program is compiled to report grammatical errors, and so on.
The following is a test program, the text file edited by Notepad causes the first three characters of the beginning of the file garbled.
#include <stdio.h>
#define MAX_LEN 100
Int main()
{
Char strLine[MAX_LEN];
Int i = 0;
Memset(strLine,0x0,MAX_LEN);
FILE * fp;
Fp = fopen("eng_query.txt","r");
If(NULL == fp)
{
Printf("open file fail.\n");
Return -1;
}
While(fgets(strLine,MAX_LEN,fp))
{
Printf("str = %s",strLine);//Print out each line of string
For(i=0; i<strlen(strLine); i++)// prints the characters in each string in hexadecimal format
{
Printf("%x ",strLine[i]);
}
Printf("\n\n");
}
Fclose(fp);
Return 0;
}
Input file:
tsinghua press
mp18
evaluating method for the double image
jiaoyuxvshi
balancing mechanism
hthr
amplification
bionic optimization algorithm
a r l
tcb
Output file:
str = singhua press
ffffffef ffffffbb ffffffbf 74 73 69 6e 67 68 75 61 20 70 72 65 73
73 a
str = mp18
6d 70 31 38 a
str = evaluating method for the double image
65 76 61 6c 75 61 74 69 6e 67 20 6d 65 74 68 6f 64 20 66 6f
72 20 74 68 65 20 64 6f 75 62 6c 65 20 69 6d 61 67 65 a
str = jiaoyuxvshi
6a 69 61 6f 79 75 78 76 73 68 69 a
str = balancing mechanism
62 61 6c 61 6e 63 69 6e 67 20 6d 65 63 68 61 6e 69 73 6d a
str = hthr
68 74 68 72 a
str = amplification
61 6d 70 6c 69 66 69 63 61 74 69 6f 6e a
str = bionic optimization algorithm
62 69 6f 6e 69 63 20 6f 70 74 69 6d 69 7a 61 74 69 6f 6e 20
61 6c 67 6f 72 69 74 68 6d a
str = a r l
61 20 72 20 6c a
str = tcb
74 63 62 a
We can find the beginning of the file three characters garbled, each Chinese character occupies two characters, so the 4th word Fubenlai is ' t ', also can only garbled display.
Therefore, you can use notepad++ instead of Notepad. You need to set the default encoding of notepad++ to UTF-8 without BOM;
Windows comes with Notepad causing a text file (UTF-8 encoding) to start with three characters garbled problem