Use Node. js to handle front-end code file encoding

Use Node. js to handle front-end code file encoding _ node. js

Last Update:2017-05-11 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This article mainly introduces the use of Node. js is used to handle the encoding of front-end code files. It is easy to handle Encoding Problems in related class libraries of Node. If you need a friend, you can refer to how to use NodeJS to compile front-end tools, the most common operations are text files. Therefore, file encoding is involved. The commonly used text encoding methods include UTF8 and GBK, And the UTF8 file may also contain BOM. When reading text files of different encodings, you must convert the file content to the UTF-8 encoded string used by JS before normal processing.

Remove BOM
BOM is used to mark a text file with Unicode encoding. It is a Unicode character ("\ uFEFF") located in the header of the text file. In different Unicode encodings, the binary bytes corresponding to BOM characters are as follows:

  Bytes   Encoding----------------------------  FE FF    UTF16BE  FF FE    UTF16LE  EF BB BF  UTF8

Therefore, we can determine whether the file contains BOM and what Unicode encoding is used based on the Several bytes in the text file header. However, even though BOM plays a role in marking file encoding, it is not part of the file content. if BOM is not removed when a text file is read, in some scenarios, problems may occur. For example, after we merge several JS files into one file, if the file contains BOM characters, it will lead to a browser JS syntax error. Therefore, BOM must be removed when NodeJS is used to read text files. For example, the following code identifies and removes the UTF8 BOM.

function readText(pathname) {  var bin = fs.readFileSync(pathname);  if (bin[0] === 0xEF && bin[1] === 0xBB && bin[2] === 0xBF) {    bin = bin.slice(3);  }  return bin.toString('utf-8');}

GBK to UTF8
NodeJS supports specifying text encoding when reading text files or when the Buffer is converted to a string. Unfortunately, GBK encoding is not supported by NodeJS itself. Therefore, we generally use the iconv-lite third-party package to convert the encoding. After downloading this package using NPM, we can write a function to read GBK text files as follows.

var iconv = require('iconv-lite');function readGBKText(pathname) {  var bin = fs.readFileSync(pathname);  return iconv.decode(bin, 'gbk');}

Single-byte encoding
Sometimes, we cannot predict which encoding is used for the file to be read, so we cannot specify the correct encoding. For example, some of the CSS files we want to process use GBK encoding, and some use UTF8 encoding. Although we can guess the Text Encoding Based on the object's byte content to a certain extent, here we will introduce some limitations, but a much simpler technology.

First, we know that if a text file only contains English characters, such as Hello World, it is okay to read the file using GBK or UTF8 encoding. This is because under these codes, ASCII0 ~ All characters in the 128 range use the same single-byte encoding.

In turn, even if a text file contains Chinese characters, if the characters to be processed are only in ASCII0 ~ Within the range of 128, for example, JS Code except comments and strings, we can use single-byte encoding to read files without worrying about whether the actual file encoding is GBK or UTF8. The following example illustrates this method.

1. GBK encoding source file content:

Var foo = 'Chinese ';

2. Corresponding Bytes:

  76 61 72 20 66 6F 6F 20 3D 20 27 D6 D0 CE C4 27 3B

3. content obtained after reading with single-byte encoding:

Var foo = '{garbled code }';

4. Replacement content:

Var bar = '{garbled code }';

5. Use single-byte encoding to save the corresponding Bytes:

  76 61 72 20 62 61 72 20 3D 20 27 D6 D0 CE C4 27 3B

6. Get the content after reading with GBK encoding:

Var bar = 'Chinese ';

The trick here is that, no matter what garbled characters a single byte greater than 0xEF is parsed under a single-byte encoding, when these garbled characters are saved using the same single-byte encoding, the corresponding bytes remain unchanged.

NodeJS comes with a binary encoding that can be used to implement this method. Therefore, in the next example, we use this encoding to demonstrate how to write the corresponding code in the previous example.

function replace(pathname) {  var str = fs.readFileSync(pathname, 'binary');  str = str.replace('foo', 'bar');  fs.writeFileSync(pathname, str, 'binary');}

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Use Node. js to handle front-end code file encoding _ node. js

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Use Node. js to handle front-end code file encoding _ node. js

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support