This article mainly introduces the use of Node. js is used to handle the encoding of front-end code files. It is easy to handle Encoding Problems in related class libraries of Node. If you need a friend, you can refer to how to use NodeJS to compile front-end tools, the most common operations are text files. Therefore, file encoding is involved. The commonly used text encoding methods include UTF8 and GBK, And the UTF8 file may also contain BOM. When reading text files of different encodings, you must convert the file content to the UTF-8 encoded string used by JS before normal processing.
Remove BOM
BOM is used to mark a text file with Unicode encoding. It is a Unicode character ("\ uFEFF") located in the header of the text file. In different Unicode encodings, the binary bytes corresponding to BOM characters are as follows:
Bytes Encoding---------------------------- FE FF UTF16BE FF FE UTF16LE EF BB BF UTF8
Therefore, we can determine whether the file contains BOM and what Unicode encoding is used based on the Several bytes in the text file header. However, even though BOM plays a role in marking file encoding, it is not part of the file content. if BOM is not removed when a text file is read, in some scenarios, problems may occur. For example, after we merge several JS files into one file, if the file contains BOM characters, it will lead to a browser JS syntax error. Therefore, BOM must be removed when NodeJS is used to read text files. For example, the following code identifies and removes the UTF8 BOM.
function readText(pathname) { var bin = fs.readFileSync(pathname); if (bin[0] === 0xEF && bin[1] === 0xBB && bin[2] === 0xBF) { bin = bin.slice(3); } return bin.toString('utf-8');}
GBK to UTF8
NodeJS supports specifying text encoding when reading text files or when the Buffer is converted to a string. Unfortunately, GBK encoding is not supported by NodeJS itself. Therefore, we generally use the iconv-lite third-party package to convert the encoding. After downloading this package using NPM, we can write a function to read GBK text files as follows.
var iconv = require('iconv-lite');function readGBKText(pathname) { var bin = fs.readFileSync(pathname); return iconv.decode(bin, 'gbk');}
Single-byte encoding
Sometimes, we cannot predict which encoding is used for the file to be read, so we cannot specify the correct encoding. For example, some of the CSS files we want to process use GBK encoding, and some use UTF8 encoding. Although we can guess the Text Encoding Based on the object's byte content to a certain extent, here we will introduce some limitations, but a much simpler technology.
First, we know that if a text file only contains English characters, such as Hello World, it is okay to read the file using GBK or UTF8 encoding. This is because under these codes, ASCII0 ~ All characters in the 128 range use the same single-byte encoding.
In turn, even if a text file contains Chinese characters, if the characters to be processed are only in ASCII0 ~ Within the range of 128, for example, JS Code except comments and strings, we can use single-byte encoding to read files without worrying about whether the actual file encoding is GBK or UTF8. The following example illustrates this method.
1. GBK encoding source file content:
Var foo = 'Chinese ';
2. Corresponding Bytes:
76 61 72 20 66 6F 6F 20 3D 20 27 D6 D0 CE C4 27 3B
3. content obtained after reading with single-byte encoding:
Var foo = '{garbled code }';
4. Replacement content:
Var bar = '{garbled code }';
5. Use single-byte encoding to save the corresponding Bytes:
76 61 72 20 62 61 72 20 3D 20 27 D6 D0 CE C4 27 3B
6. Get the content after reading with GBK encoding:
Var bar = 'Chinese ';
The trick here is that, no matter what garbled characters a single byte greater than 0xEF is parsed under a single-byte encoding, when these garbled characters are saved using the same single-byte encoding, the corresponding bytes remain unchanged.
NodeJS comes with a binary encoding that can be used to implement this method. Therefore, in the next example, we use this encoding to demonstrate how to write the corresponding code in the previous example.
function replace(pathname) { var str = fs.readFileSync(pathname, 'binary'); str = str.replace('foo', 'bar'); fs.writeFileSync(pathname, str, 'binary');}