Saturday, November 24, 2018

Dissecting a Bug in the EternalBlue Client for Windows XP (FuzzBunch)

See Also: Dissecting a Bug in the EternalRomance Client (FuzzBunch)

Background 

Pwning Windows 7 was no problem, but I would re-visit the EternalBlue exploit against Windows XP for a time and it never seemed to work. I tried all levels of patching and service packs, but the exploit would either always passively fail to work or blue-screen the machine. I moved on from it, because there was so much more of FuzzBunch that was unexplored.

Well, one day on a pentest a wild Windows XP appeared, and I figured I would give FuzzBunch a go. To my surprise, it worked! And on the first try.

Why did this exploit work in the wild but not against runs in my "lab"?

tl;dr: Differences in NT/HAL between single-core/multi-core/PAE CPU installs causes FuzzBunch's XP payload to abort prematurely on single-core installs.

Multiple Exploit Chains 

Keep in mind that there are several versions of EternalBlue. The Windows 7 kernel exploit has been well documented. There are also ports to Windows 10 which have been documented by myself and JennaMagius as well as sleepya_.

But FuzzBunch includes a completely different exploit chain for Windows XP, which cannot use the same basic primitives (i.e. SMB2 and SrvNet.sys do not exist yet!). I discussed this version in depth at DerbyCon 8.0 (slides / video).

tl;dw: The boot processor KPCR is static on Windows XP, and to gain shellcode execution the value of KPRCB.PROCESSOR_POWER_STATE.IdleFunction is overwritten.

Payload Methodology 

As it turns out, the exploit was working just fine in the lab. What was failing was FuzzBunch's payload.

The main stages of the ring 0 shellcode performs the following actions:

  1. Obtains &nt and &hal using the now-defunct KdVersionBlock trick
  2. Resolves some necessary function pointers, such as hal!HalInitializeProcessor
  3. Restores the boot processor KPCR/KPRCB which was corrupted during exploitation
  4. Runs DoublePulsar to backdoor the SMB service
  5. Gracefully resumes execution at a normal state (nt!PopProcessorIdle)

Single Core Branch Anomaly 

Setting a couple hardware breakpoints on the IdleFunction switch and +0x170 into the shellcode (after a couple initial XOR/Base64 shellcode decoder stages), it is observed that a multi-core machine install branches differently than the single-core machine.

kd> ba w 1 ffdffc50 "ba e 1 poi(ffdffc50)+0x170;g;"

The multi-core machine has acquired a function pointer to hal!HalInitializeProcessor.

Presumably, this function will be called to clean up the semi-corrupted KPRCB.

The single-core machine did not find hal!HalInitializeProcessor... sub_547 instead returned NULL. The payload cannot continue, and will now self destruct by zeroing as much of itself out as it can and set up a ROP chain to free some memory and resume execution.

Note: A successful shellcode execution will perform this action as well, just after installing DoublePulsar first.

Root Cause Analysis 

The shellcode function sub_547 does not properly find hal!HalInitializeProcessor on single core CPU installs, and thus the entire payload is forced to abruptly abort. We will need to reverse engineer the shellcode function to figure out exactly why the payload is failing.

There is an issue in the kernel shellcode that does not take into account all of the different types of the NT kernel executables are available for Windows XP. Specifically, the multi-core processor version of NT works fine (i.e. ntkrnlamp.exe), but a single core install (i.e. ntoskrnl.exe) will fail. Likewise, there is a similar difference in halmacpi.dll vs halacpi.dll.

The NT Red Herring 

The first operation that sub_547 performs is to obtain HAL function imports used by the NT executive. It finds HAL functions by first reading at offset 0x1040 into NT.

On multi-core installs of Windows XP, this offset works as intended, and the shellcode finds hal!HalQueryRealTimeClock:

However, on single-core installations this is not a HAL import table, but instead a string table:

At first I figured this was probably the root cause. But it is a red herring, as there is correction code. The shellcode will check if the value at 0x1040 is an address in the range within HAL. If not it will subtract 0xc40 and start searching in increments of 0x40 for an address within the HAL range, until it reaches 0x1040 again.

Eventually, the single-core version will find a HAL function, this time hal!HalCalibratePerformanceCounter:

This all checks out and is fine, and shows that Equation Group did a good job here for determining different types of XP NT.

HAL Variation Byte Table 

Now that a function within HAL has been found, the shellcode will attempt to locate hal!HalInitializeProcessor. It does so by carrying around a table (at shellcode offset 0x5e7) that contains a 1-byte length field followed by an expected sequence of bytes. The original discovered HAL function address is incremented in search of those bytes within the first 0x20 bytes of a new function.

The desired 5 bytes are easily found in the multi-core version of HAL:

However, the function on single-core HAL is much different.

There is a similar mov instruction, but it is not a movzx. The byte sequence being searched for is not present in this function, and consequently the function is not discovered.

Conclusion 

It is well known (from many flame wars on Windows kernel development mailing lists) that searching for byte sequences to identify functions is unreliable across different versions and service packs of Windows. We have learned from this bug that exploit developers must also be careful to account for differences in single/multi-core and PAE variations of NTOSKRNL and HAL. In this case, the compiler decided to change one movzx instruction to a mov instruction and broke the entire payload.

It is very curious that the KdVersionBlock trick and a byte sequence search is used to find functions in this payload. The Windows 7 payload finds NT and its exports in, as seen, a more reliable way, by searching backwards in memory from the KPCR IDT and then parsing PE headers.

This HAL function can be found through such other means (it appears readily exported by HAL). The corrupted KPCR can also be cleaned up in other ways. But those are both exercises for the reader.

There is circumstantial evidence that primary FuzzBunch development was started in late 2001. The payload seems maybe it was only written for and tested against multi-core processors? Perhaps this could be a indicator as to how recent the XP exploit was first written. Windows XP was broadly released on October 25, 2001. While this is the same year that IBM invented the first dual-core processor (POWER4), Intel and AMD would not have a similar offering until 2004 and 2005, respectively.

This is yet another example of the evolution of these ETERNAL exploits. The Equation Group could have re-used the same exploit and payload primitives, yet chose to develop them using many different methodologies, perhaps so if one methodology was burned they could continue to reap the benefits of their exploit diversification. There is much esoteric Windows kernel internals knowledge that can be learned from studying these exploits.

Saturday, June 16, 2018

Dissecting a Bug in the EternalRomance Client (FuzzBunch)

Note: This post does not explain the EternalRomance exploit chain, just a quirky bug in the Equation Group's client. For comprehensive exploit details, come see my presentation at DEF CON 26 (August 2018).


Background

In SMBv1, transactions are looked up via their User ID, Tree ID, Process ID, and Multiplex ID fields (UID, TID, PID, MID). This allows a client to have many transactions running at once, as needed. UID and TID are server-assigned, and PID is client-set but usually static. Generally, a client will only use the MID, set to a random value, to distinguish distinct transactions.

Fish in a Barrel

In EternalRomance, the MID must be set to a specific value (File ID). In order for the Equation Group to multiplex multiple transactions, the PID is used instead. The PID is what separates "dynamite sticks" in the Fish-In-A-Barrel heap feng shui.

                                               
Figure 1. Fish in a Barrel (Red: Dynamite - Blue: Fish)

Dynamite are transactions that can (ideally) cause overflow into another transaction. Sometimes a dynamite stick fails, simply because memory allocations can be volatile. In this case, EternalRomance should try the next stick.

Discovering the Bug

I had nop'd out the Srv.sys vulnerability being exploited using WinDbg so that I could observe the network traffic during failures and other various reasons.

I noticed that EternalRomance, during the grooming phase, sent dynamite sticks with PIDs 0, 1, and 2. However, it was only attempting to ignite one PID (dynamite stick) for every execution attempt. The PID 0.

This must be a mistake because igniting the same dynamite 3 times in a row does absolutely nothing but send superfluous network traffic with no change in result. A dynamite stick either works or it simply always will be a dud. And besides, why did it bother to send the other 2 dynamite in the first place?

In fact, igniting the same dynamite stick multiple times is dangerous, because it increments a pointer each time, and the offset for the overwrite (a neighboring MID) stays static. On a side note, I also noticed the first exploit attempt always tries to overwrite two bytes, and all secondary dynamite attempts only overwrite one byte. Because of the way they set up the exploit, only a one byte overwrite is necessary (though two bytes won't hurt if it hits the right place). Another peculiarity.

I messed around with the MaxExploitAttempt settings, which has a default value of 3. I set it to its maximum allowed of 16. Now the PID started at 3?

This time, PIDs 3 through 15 were observed, and the last 3 exploit attempts sent PID=0.

The Binary is Truth

Well some debugging later, I figured out that the InitializeParameters() function (there are no symbols in the binary, but a few functions have helpful debug strings when handling error conditions) was allocating two arrays for the dynamite stick PIDs.

unsigned int size = ExploitStruct->MaxExploitAttempts_0x4360;

if (size <= 16)
{
    ExploitStruct->PidTable_0x44a0 = (PWORD) TbMalloc(2 * size);
    ExploitStruct->PidTable_0x44a4 = (PWORD) TbMalloc(2 * size);
}
else
{
    // print error message: too many max exploit attempts
}

TbMalloc is Equation Group's library function (tibe-2.dll) that just calls malloc() and then memset() to 0 (essentially calloc() but with one argument).

I set a hardware breakpoint on the tables and noticed that in SmbRemoteApiTransactionGroom() (another unnamed function) there was the following logic. This function completes when the dynamite are initially sent (before any are ignited).

if (DynamiteNum >= 3)
{
    ExploitStruct->PidTable_0x44a4[DynamiteNum - 3] = DynamitePid;
}
else
{ 
    ExploitStruct->PidTable_0x44a0[DynamiteNum] = DynamitePid;
}

Later, in DoWriteAndXExploitTransactionForRemApi(), the table where DynamiteNum >= 3 is used to source PIDs to ignite the dynamite.

This means PidTable_0x44a4 is never given values when MaxExploitAttempts=3. Observe 3 shorts set to 0 at the address in the dump.

And we can see the cause for the quirky behavior of the network traffic starting at PID=3, when MaxExploitAttempts=16 (or any greater than 3). Observe several shorts incrementing from 3, followed by three 0.

As far as I can tell, the PidTable_0x44a0 table (the one that holds the first 3 PIDs) simply isn't used, at least when tested against several versions of Windows XP and Server 2003.

Conclusion

This bug was probably missed, by both analysts and the Equation Group, for a few reasons:

  • Fish in a Barrel is only used for older versions of Windows (it's fixed in 7+)
  • It almost always succeeds the first time, as it is a rarely used pre-allocated heap
  • TbMalloc initializes all PID to 0, and the first dynamite PID is 0
  • The bug is quite subtle, I missed it several times because of assumptions

The real mystery is why is there this logic for the second table that isn't used?