Deep analysis of buffer module of Node.js

Last Update:2017-01-13 Source: Internet

Author: User

Tags base64 tojson git clone

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Objective

JavaScript is designed for browsers to handle Unicode-encoded strings well, but not for binary or non-Unicode-encoded data. Node.js inherits the language features of JavaScript while expanding the JavaScript language, providing a buffer class for binary data processing, allowing Node.js to handle various types of data like other programming languages.

There are many articles on the internet about buffer, most of which is the principle, how to use almost can not find, the article will focus on the use of buffer.

Directory

Introduction to Buffer

Basic use of buffer

Performance test for Buffer

1. Introduction to Buffer

In Node.js, the buffer class is the core library that is published with the node kernel. The buffer library provides a way for Node.js to store raw data, allowing Nodejs to process binary data, which can be used whenever data that is moved in the I/O operation is needed in Nodejs. The raw data is stored in an instance of the Buffer class. A Buffer is similar to an array of integers, but it corresponds to a piece of raw memory outside the V8 heap memory.

The conversion between Buffer and Javascript string objects requires an explicit call to the encoding method to complete. The following are several different string encodings:

' ASCII '-only for 7-bit ASCII characters. This encoding method is very fast and discards high data.
' utf8′– a multibyte-encoded Unicode character. Many Web pages and other file formats use UTF-8.
' ucs2′– two bytes, Unicode character encoded in small-tailed byte order (Little-endian). It can only encode characters within the scope of BMP (Basic Multilingual Plane, u+0000–u+ffff).
' Base64′–base64 string encoding.
' Binary '-an encoding that converts raw binary data into strings, using only the first 8 bits of each character. This encoding method is obsolete and should be used as much as possible with the Buffer object. Subsequent versions of Node will delete this encoding.
Buffer Official Document: http://nodejs.org/api/buffer.html

2. Basic use of buffer

The basic use of buffer, mainly is the API provided by the operation, mainly includes 3 parts to create a buffer class, read buffer, write buffer. As the basic operation in the official document detailed use of the introduction, I just a brief list.

System environment

Win7 64bit
nodejs:v0.10.31
npm:1.4.23
Create a project

~ CD d:\workspace\javascript>
~ D:\workspace\javascript>mkdir nodejs-buffer && CD Nodejs-buffer
2.1 Creating a Buffer class

To create an instance of buffer, we create it by using the new buffer. New file Buffer_new.js.

~ VI buffer_new.js

A buffer instance with a length of 0
var a = new Buffer (0);
Console.log (a);
> <buffer >

The buffer instance with length 0 is the same, A1,A2 is an instance
var a2 = new Buffer (0);
Console.log (A2);
> <buffer >

A buffer instance with a length of 10
var a10 = new Buffer (10);
Console.log (A10);
> <buffer (00>)

Array
var B = new Buffer ([' A ', ' B ', 12])
Console.log (b);
> <buffer 0c>

Character encoding
var b2 = new Buffer (' Hello ', ' utf-8 ');
Console.log (B2);
> <buffer e4 bd a0 e5 a5 bd>
The buffer class has 5 class methods for the auxiliary operation of the buffer class.

1 code Check, the above mentioned buffer and JavaScript string conversion, need to explicitly set the encoding, then these types of encoding is supported by buffer. Like Chinese processing can only use UTF-8 encoding, for a few years ago the commonly used gbk,gb2312 encoding is unable to resolve.

Supported encodings
Console.log (buffer.isencoding (' Utf-8 '))
Console.log (buffer.isencoding (' binary '))
Console.log (buffer.isencoding (' ASCII '))
Console.log (buffer.isencoding (' UCS2 '))
Console.log (buffer.isencoding (' base64 '))
Console.log (buffer.isencoding (' hex ')) # 16
> True

Unsupported encoding
Console.log (buffer.isencoding (' GBK '))
Console.log (buffer.isencoding (' gb2312 '))
> False
2 buffer check, many times we need to determine the type of data, corresponding to the subsequent operation.

is the buffer class
Console.log (New Buffer (' A ')) (Buffer.isbuffer)
> True

It's not buffer.
Console.log (Buffer.isbuffer (' ADFD '))
Console.log (Buffer.isbuffer (' \U00BD\U00BD '))
> False
3 The byte length of the string, because the string encoding is different, so the string length and byte length are sometimes not the same. For example, 1 Chinese characters are 3 bytes, and the output through the Utf-8 encoding is 4 Chinese characters, representing 12 bytes.

var str2 = ' fan log ';
Console.log (str2 + ":" + str2.length + "characters," + buffer.bytelength (str2, ' UTF8 ') + "bytes");
> Fan log: 4 characters, bytes
Console.log (str2 + ":" + str2.length + "characters," + buffer.bytelength (str2, ' ASCII ') + "bytes");
> Fan Diary: 4 characters, 4 bytes
4 The connection of the buffer, used to connect an array of buffer. We can manually allocate the buffer space size of the object, if the buffer space is not enough, then the data will be truncated.

var B1 = new Buffer ("ABCD");
var b2 = new Buffer ("1234");
var B3 = Buffer.concat ([b1,b2],8);
Console.log (B3.tostring ());
> abcd1234

var b4 = buffer.concat ([b1,b2],32);
Console.log (B4.tostring ());
Console.log (b4.tostring (' hex '));//16 output
> abcd1234 garbled ....
> 616263643132333404000000000000000000000000000000082a330200000000

var B5 = buffer.concat ([b1,b2],4);
Console.log (B5.tostring ());
> ABCD

Screenshot of program run

5 buffer comparison, for the content of the buffer, sorted by string order.

var a1 = new Buffer (' 10 ');
var a2 = new Buffer (' 50 ');
var a3 = new Buffer (' 123 ');

A1 less than A2
Console.log (Buffer.compare (A1,A2));
>-1

A2 less than A3
Console.log (Buffer.compare (A2,A3));
> 1

A1,a2,a3 sorted output
Console.log ([A1,a2,a3].sort (Buffer.compare));
> [<buffer 30>, <buffer 33>, <buffer 30>]

A1,a2,a3 sorted output to utf-8 encoded output
Console.log ([A1,a2,a3].sort (Buffer.compare). toString ());
> 10,123,50
2.2 Write Buffer

Writes the data to the buffer the operation, creates the new file buffer_write.js.

~ VI buffer_write.js

//////////////////////////////
Buffer Write
//////////////////////////////

Create a buffer with a space size of 64 bytes
var buf = new Buffer (64);

Write buffer from start, offset 0
var len1 = Buf.write (' Write from start ');

Print the length of the data, print the data from 0 to len1 position in the buffer
Console.log (len1 + "bytes:" + buf.tostring (' UTF8 ', 0, len1));

Re-write buffer, offset 0, will overwrite buffer memory before
Len1 = Buf.write (' re-writing ');
Console.log (len1 + "bytes:" + buf.tostring (' UTF8 ', 0, len1));

Continue writing buffer, offset len1, write Unicode string
var len2 = Buf.write (' \u00bd + \U00BC = \u00be ', len1);
Console.log (len2 + "bytes:" + buf.tostring (' UTF8 ', 0, len1+len2));

Continue writing buffer, offset 30
var Len3 = Buf.write (' written from 30th position ', 30);
Console.log (Len3 + "bytes:" + buf.tostring (' UTF8 ', 0, 30+len3));

Total buffer length and data
Console.log (buf.length + "bytes:" + buf.tostring (' UTF8 ', 0, buf.length));

Continue writing buffer, offset 30+len3
var len4 = buf.write (' The length of the written data exceeds the total buffer length! ', 30+LEN3);

Data that exceeds the buffer space is not written to the buffer
Console.log (buf.length + "bytes:" + buf.tostring (' UTF8 ', 0, buf.length));
Buffer_write

The buffer of the Node.js node, depending on the range of read-write integers, support for different widths is provided so that integers from 1 to 8 bytes (8-bit, 16-bit, 32-bit), floating point (float), double-precision floating-point numbers (double) can be accessed, corresponding to different writexxx () functions , using the same method as Buf.write ().

Buf.write (string[, offset][, length][, encoding])
Buf.writeuintle (value, offset, bytelength[, Noassert])
BUF.WRITEUINTBE (value, offset, bytelength[, Noassert])
Buf.writeintle (value, offset, bytelength[, Noassert])
BUF.WRITEINTBE (value, offset, bytelength[, Noassert])
Buf.writeuint8 (value, offset[, Noassert])
Buf.writeuint16le (value, offset[, Noassert])
Buf.writeuint16be (value, offset[, Noassert])
Buf.writeuint32le (value, offset[, Noassert])
Buf.writeuint32be (value, offset[, Noassert])
Buf.writeint8 (value, offset[, Noassert])
Buf.writeint16le (value, offset[, Noassert])
Buf.writeint16be (value, offset[, Noassert])
Buf.writeint32le (value, offset[, Noassert])
Buf.writeint32be (value, offset[, Noassert])
Buf.writefloatle (value, offset[, Noassert])
BUF.WRITEFLOATBE (value, offset[, Noassert])
Buf.writedoublele (value, offset[, Noassert])
Buf.writedoublebe (value, offset[, Noassert])
In addition, with regard to the buffer write operation, there are some buffer class prototype functions can be manipulated.

Buffer copy function buf.copy (targetbuffer[, targetstart][, sourcestart][, Sourceend]).

Create a new two buffer instance
var buf1 = new Buffer (26);
var buf2 = new Buffer (26);

Write data to 2 instances, respectively
for (var i = 0; i < i++) {
Buf1[i] = i + 97; 97 is ASCII's a
Buf2[i] = 50; 50 is ASCII 2.
}

Copy the BUF1 memory to Buf2
Buf1.copy (BUF2, 5, 0, 10); Inserts from the 5th byte position of the BUF2, copying Buf1 from 0-10 bytes of data to Buf2
Console.log (buf2.tostring (' ASCII ', 0, 25)); Enter 0-25 bytes of buf2
> 22222abcdefghij2222222222
Buffer fill function Buf.fill (value[, offset][, end]).

New buffer instance, length 20
var buf = new Buffer (20);

Populating data in Buffer
Buf.fill ("H");
Console.log (BUF)
> <buffer to the 68> of the------
Console.log ("BUF:" +buf.tostring ())
> buf:hhhhhhhhhhhhhhhhhhhh
Clear the data in the buffer
Buf.fill ();
Console.log ("BUF:" +buf.tostring ())
> BUF:
Buffer cropping, buf.slice ([start][, end]). Returns a new buffer that points to the same memory as the old buffer, but is trimmed from the position of the index start to end.

var buf1 = new Buffer (26);
for (var i = 0; i < i++) {
Buf1[i] = i + 97;
}

The newly generated buf2 is a slice of buf1 from the byte of the 0-3 position in the clipping buf1.
var buf2 = buf1.slice (0, 3);
Console.log (buf2.tostring (' ASCII ', 0, buf2.length));
> ABC

When modifying Buf1, BUF2 also changes
Buf1[0] = 33;
Console.log (buf2.tostring (' ASCII ', 0, buf2.length));
>!BC

2.3 Read Buffer

After we write the data into the buffer, we also need to read the data from the buffer and create a new file buffer_read.js. We can use the readxxx () function to obtain the encoding should be encoded in the index value, and then convert the original value out, there is this way to manipulate the characters will become troublesome, the most common way to read the buffer, in fact, is ToString ().

~ VI buffer_read.js

//////////////////////////////
Buffer Read
//////////////////////////////

var buf = new Buffer (10);
for (var i = 0; i < i++) {
Buf[i] = i + 97;
}
Console.log (buf.length + "bytes:" + buf.tostring (' utf-8 '));
> Bytes:abcdefghij

Reading data
for (ii = 0; II < buf.length; ii++) {
var ch = buf.readuint8 (ii); Get an ASCII index
Console.log (ch + ":" + string.fromcharcode (CH));
}
> 97:a
98:b
99:c
100:d
101:e
102:f
103:g
104:h
105:i
106:j

Writes Chinese data, reads in Readxxx, and represents a medium text in 3 bytes.

var buf = new Buffer (10);
Buf.write (' ABCD ')
Buf.write (' Data ', 4)
for (var i = 0; i < buf.length; i++) {
Console.log (Buf.readuint8 (i));
}

>97
98
99
100
230//230,149,176 means "number"
149
176
230//230,141,174 stands for "according"
141
174

If you want to output the correct Chinese, then we can use ToString (' utf-8′ ') function to operate.

Console.log ("Buffer:" +buf); function with ToString () called by default
> BUFFER:ABCD Data
Console.log ("Utf-8:" +buf.tostring (' Utf-8 '));
> utf-8: ABCD Data
Console.log ("ASCII:" +buf.tostring (' ASCII '));/There are garbled, Chinese can not be correctly parsed
> ascii:abcdf0f
.
Console.log ("Hex:" +buf.tostring (' hex ')); 16 in-system
> Hex:61626364e695b0e68dae
For the output of the buffer, we use the most operation is ToString (), according to the stored encoding to read. In addition to the ToString () function, you can also use Tojson () direct buffer to parse into a JSON object.

var buf = new Buffer (' Test ');
Console.log (Buf.tojson ());
> {type: ' Buffer ', data: [116, 101, 115, 116]}
3. Buffer Performance Test

Through the introduction of the buffer in the above, we have learned the basic use of buffer, next, we have to start doing buffer to do some testing.

3.1 8K's creation test

Every time we create a new buffer instance, we check that the current buffer's memory pool is full, that the current memory pool is shared for the new buffer instance, and that the memory pool size is 8K.

If the newly created buffer instance is greater than 8K, the buffer is given to the Slowbuffer instance store, and if the newly created buffer instance is less than 8K and is less than the remaining space in the current memory pool, the buffer is stored in the current memory pool If the buffer instance is less than 0, reunification returns the default Zerobuffer instance.

Below we create 2 buffer instances, the first one is 4k space, the second is 4.001k, and the loop is created 100,000 times.

var num = 100*1000;
Console.time ("Test1");
for (Var i=0;i<num;i++) {
New Buffer (1024*4);
}
Console.timeend ("Test1");
> test1:132ms

Console.time ("Test2");
for (Var j=0;j<num;j++) {
New Buffer (1024*4+1);
}
Console.timeend ("Test2");
> test2:163ms
The second, 4.001k space, takes 23% more time, which means that the second one, every two cycles, will reapply the space for the memory pool. That's what we need to be very careful about.

More than 3.2 buffer or a single buffer

When we need to cache the data, create a number of small buffer instances good, or create a large buffer instance good? For example, we're going to create 10,000 strings that range in length between 1-2048.

var max = 2048; Maximum length
var time = 10*1000; Cycle 10,000 times

Create a string based on length
function getString (size) {
var ret = ""
for (var i=0;i<size;i++) ret = "a";
return ret;
}

Generates an array of strings, 10,000 records
var arr1=[];
for (Var i=0;i<time;i++) {
var size = Math.ceil (Math.random () *max)
Arr1.push (size) (getString);
}
Console.log (ARR1);

Create 10,000 instances of a small buffer
Console.time (' test3 ');
var arr_3 = [];
for (Var i=0;i<time;i++) {
Arr_3.push (New Buffer (arr1[i));
}
Console.timeend (' test3 ');
> test3:217ms

Create a large instance, and an offset array to read the data.
Console.time (' test4 ');
var buf = new Buffer (Time*max);
var offset=0;
var arr_4=[];
for (Var i=0;i<time;i++) {
Arr_4[i]=offset;
Buf.write (arr1[i],offset,arr1[i].length);
Offset=offset+arr1[i].length;
}
Console.timeend (' test4 ');
> test4:12ms

Reads the data indexed as 2.

Console.log ("src:[2]=" +arr1[2]);
Console.log ("test3:[2]=" +arr_3[2].tostring ());
Console.log ("test4:[2]=" +buf.tostring (' Utf-8 ', arr_4[2],arr_4[3]);

Run the results as shown in the figure.
。

For this type of demand, the early generation of a large buffer instance for storage, more efficient than each generation of small buffer instances, can increase the computational efficiency of an order of magnitude. So, understanding and using good buffer is very important!!

3.3 String VS. Buffer

With the buffer do we need to replace all string connections with a buffer? So we need to test which string and buffer are strings connected, which is faster?

Below we do string concatenation, loop 300,000 times.

Test three, Buffer VS string
var time = 300*1000;
var txt = "AAA"

var str = "";
Console.time (' Test5 ')
for (Var i=0;i<time;i++) {
STR + txt;
}
Console.timeend (' Test5 ')
> Test5:24ms

Console.time (' Test6 ')
var buf = new Buffer (Time * txt.length)
var offset = 0;
for (Var i=0;i<time;i++) {
var end = offset + txt.length;
Buf.write (Txt,offset,end);
Offset=end;
}
Console.timeend (' Test6 ')
> test6:85ms

From the test results, we can see clearly that string concatenation is much faster than the buffer connection operation. So when we save the string, we should use string or string. Then we use buffer only when we save the Utf-8 string and binary data.

6. Program code

The program code of this article, can download the source code of this article directly from GitHub above, follow the introduction of the piece article to learn buffer, download address: Https://github.com/bsspirit/nodejs-buffer

You can also download it directly with the GitHub command line:

~ git clone git@github.com:bsspirit/nodejs-buffer.git # download GitHub Project
~ CD nodejs-buffer # Enter the download directory
about the node.js of the bottom, I do not contact many, not from the V8 (c + +) do more in-depth research, only at the level of use, write my summary. For the error in the article or the description is not clear place, but also please Daniel to correct me!!

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More