Saturday, July 1, 2017

ThreadContinue - Reflective DLL Injection Using SetThreadContext() and NtContinue()

In the attempt to evade AV, attackers go to great lengths to avoid the common reflective injection code execution function, CreateRemoteThread(). Alternative techniques include native API (ntdll) thread creation and user APCs (necessary for SysWow64->x64), etc.

This technique uses SetThreadContext() to change a selected thread's registers, and performs a restoration process with NtContinue(). This means the hijacked thread can keep doing whatever it was doing, which may be a critical function of the injected application.

You'll notice the PoC (x64 only, #lazy) is using the common VirtualAllocEx() and WriteVirtualMemory() functions. But instead of creating a new remote thread, we piggyback off of an existing one, and restore the original context when we're done with it. This can be done locally (current process) and remotely (target process).

Stage 0: Thread Hijack

Code can be found in hijack/hijack.c

  1. Select a target PID.
  2. Process is opened, and any thread is found.
  3. Thread is suspended, and thread context (CPU registers) copied.
  4. Memory allocated in remote process for reflective DLL.
  5. Memory allocated in remote process for thread context.
  6. Set the thread context stack pointer to a lower address.
  7. Change thread context with SetThreadContext().
  8. Resume the thread execution.

Stage 1: Reflective Restore

Code can be found in dll/ReflectiveDll.c

  1. Normal reflective DLL injection takes place.
  2. Optional: Spawn new thread locally for a primary payload.
  3. Optional: Thread is restored with NtContinue(), using the passed-in previous context.

You can go from x64->SysWow64 using Wow64SetThreadContext(), but not the other way around. I unfortunately did not observe possible sorcery for SysWow64->x64.

One major hiccup to overcome, in x64 mode, is that the register RCX (function param 1) is volatile even across a SetThreadContext() call. To overcome this, I stored a cave (in this case, the DOS header). Luckily, NtContinue() allows setting the volatile registers, so there's no issues in the restoration process, otherwise it would have needed a hacky code cave inserted or something.

    // retrieve CONTEXT from DOS header cave
    lpParameter = (LPVOID)*((PULONG_PTR)((LPBYTE)uiLibraryAddress+2));

Another issue is we could corrupt the original threads stack. I subtracted 0x2000 from RSP to find a new spot to spam up.

I've seen similar (but non-successful) techniques for code injection. I found a rare amount of similar information [1] [2]. These techniques were not interested in performing proper cleanup of the stolen thread, which is not practical in many circumstances. This is essentially the same process that RtlRemoteCall() follows. As such, there may be issues for threads in a wait state returning an incorrect status? None of these sources uses reflective restoration.

As user mode API is highly explored territory, this may not be an original technique. If so, take the example for what it is ([relatively] clean code with academic explanation) and chalk it up to multiple discovery. Leave flames, spam, and questions in the comments!

If you want to learn more about techniques like this, come to the Advanced Windows Post-Exploitation / Malware Forward Engineering DEF CON 25 workshop.

8 comments :

  1. Nice technique mate... thanks for share! I'll test it out :-)

    ReplyDelete
  2. Taking over a remote thread is a traditional way.

    ReplyDelete
    Replies
    1. Cleaning up, and using it for reflective injection, is the interesting part I could not find any information about. Most resources I saw (linked in post) do hijack threads this way, but run shellcode and then dispose of it entirely.

      The technique is fairly obvious so I don't doubt it's probably been done correctly by some malware sample at some point.

      Delete
    2. Similar one : https://github.com/0x00ach/debug_inject/blob/master/main.cpp

      Delete
    3. One thing that is unique, it passes the shellcode through debug events instead of using VirtualAlloc() and WriteProcessMemory(). I don't think it will work on x64 though, since it doesn't use NtContinue() and registers are volatile in SetThreadContext() (though it could be modified). Neat trick!

      Delete
    4. I'd end it with an infinite spinloop

      Delete
    5. Why can't you just use QueueUserAPC?

      Delete
    6. You can. I mention APCs, they are one of the only ways to go from a 32-bit process into a 64-bit one. But the idea here is to find "new" APIs to abuse instead of relying on old tricks.

      Delete