## Tuesday, May 5, 2015

### Big-O Complexity != Actual Performance

Any easy mistake to make is to assume that because one algorithm has a better Big-O complexity, it must be faster to execute. Similarly, two algorithms with the same Big-O, tackling the same problem, and all things similar may have vastly different performance.

Consider for example a "unique string" algorithm, where you want to check if a string contains all unique characters. A naive approach to the problem, which does not use extra data structures, is to go through the string, iterating each letter against the entire string searching for a match. This approach is O(n^2).

```int strunique_naive(const char *str)
{
char *iter, *tmp;

for (iter = (char *)str; *iter != '\0'; ++iter)
for (tmp = (char *)str; *tmp != '\0'; ++tmp)
if (*tmp == *iter)
return 0;

return 1;
}```

Now consider a better approach to the problem. We can optimize the algorithm by adding an offset for the iterating over the string part, since for example in a short string there is no reason to test the last character with the first character as the first character already performed that test during its loop-through.

```int strunique_better(const char *str)
{
size_t offset = 0;    /* offset at 0 */
char *iter, *tmp;

for (iter = (char *)str; *iter != '\0'; ++iter, ++offset)
for (tmp = (char *)str + offset; *tmp != '\0'; ++tmp)
if (*tmp == *iter)
return 0;

return 1;
}

int strunique_best(const char *str)
{
size_t offset = 1;   /* pre-emptively shift offset up */

/* repeat code above */
}
```

When we calculate the number of loops for the worst-case scenario (i.e. a string that is unique) for the naive algorithm, we get n^2 loops. The better algorithm is done in (n^2 + n) / 2, and the best is n * (n - 1) / 2 loops. The following table shows the differences:

### # of Loops on Worst Case

strunique_naive strunique_better strunique_best 1 1 0 4 3 1 9 6 3 25 15 10 100 55 45 10,000 5,050 4,950

So why can you not trust Big-O complexity for actual performance? All 3 algorithms are actually O(n^2)! The better and best algorithms just have an improved actual performance equation.

### Algorithm Benchmarks

strunique_naive strunique_better strunique_best O(n^2) O(n^2) O(n^2) n^2 (n^2 + n) / 2 n * (n - 1) / 2

## Friday, January 16, 2015

### Pwn Adventure 3 Speed Hack

The Ghost in the Shellcode CTF has a number of challenges within an Unreal engine MMO. The game, Choose Your Pwn Adventure 3, is designed to be hacked. The hackable game code is mostly in a DLL called GameLogic.dll, which is a C++ binary with a provided debug .pdb file.

You start out running pretty slowly, so getting around the huge island is a pain. It's simple to get some basic speed hacking going though.

After going through the DLL and looking at function names that seemed interesting, I ran across the following:

I followed the offset and found the following there:

.rdata:10078B14 __real@40400000 dd 3.0

The hex for the float value was 0x4040. I changed this modifier to be a little bit higher by patching the DLL with a hex editor.

And that's it.

## Thursday, January 15, 2015

### Practical Reverse Engineering p. 79 #10

Question number 10 on page 79 of Practical Reverse Engineering is as follows:

Figure 2-16 is a function from Windows RT. Read MSDN if needed. Ignore the security PUSH/POP cookie routines.

Here is the disassembly of the function:

 Figure 2-16. Practical Reverse Engineering. © 2014 by Bruce Dang

The ARM processor is in Thumb state, but transfers out during some of the syscalls. The function queries different clock APIs depending on the size of the supplied struct.

```size_t QueryChrono(size_t *bytes_copied,
size_t max_size,
struct *clock_info)
{
/* MOVS R4, #0 */
bytes_copied = 0;

/* CMP R5, #0x10 */
if (max_size >= 16)
{
SYSTEMTIME sysTime;           /* SUB SP, SP, #0x10 */
GetSystemTime(&sysTime);      /* LDR R3, =__imp_GetSystemTime */

/* LDR R3, [SP,#0x1C+var_1C] */
/* LDR R3, [SP,#0x1C+var_18] */
/* LDR R3, [SP,#0x1C+var_14] */
/* LDR R3, [SP,#0x1C+var_10] */
/* STR R3, [R6,#0xC] */
memcpy(clock_info->sysTime0x0, &sysTime, sizeof(SYSTEMTIME));

bytes_copied = 16;              /* MOVS R4, #0x10 */
}

/* SUBS R3, R5, R4 */
/* CMP R3, #4 */
if ((max_size - bytes_copied) >= 4)
{
/* LDR R3, =__imp_GetCurrentProcessId */
/* STR R0, [R6,R4] */
*(clock_info + bytes_copied) = GetCurrentProcessId();

bytes_copied += 4;              /* ADDS R4, #4 */
}

/* SUBS R3, R5, R4 */
/* CMP R3, #4 */
if ((max_size - bytes_copied) >= 4)
{
/* LDR R3, =__imp_GetTickCount */
/* STR R0, [R6,R4] */
*(clock_info + bytes_copied) = GetTickCount();

bytes_copied += 4;              /* ADDS R4, #4 */
}

/* SUBS R3, R5, R4 */
/* CMP R3, #9 */
if ((max_size - bytes_copied) >= 8)
{
/* MOV R0, SP */
LARGE_INTEGER perfCount;

/* LDR R3, =__imp_QueryPerformanceCounter */
QueryPerformanceCounter(&perfCount);

/* STR R3, [R6,R4] */
/* STR R3, [R2,#4] */
memcpy((clock_info + bytes_copied), &perfCount,
sizeof(LARGE_INTEGER));

bytes_copied += 8;              /* ADDS R4, #8 */
}

return bytes_copied;                /* MOV R0, R4 */
}```

## Tuesday, January 13, 2015

### Practical Reverse Engineering p. 79 #9

Question number 9 on page 79 of Practical Reverse Engineering is as follows:

What does the function shown in Figure 2-15 do?

Here is the function's disassembly:

 Figure 2-15. Practical Reverse Engineering. © 2014 by Bruce Dang

The ARM processor is in Thumb state. This is essentially the same functionality as Figure 2-14, except the count variable is gone.

```int32_t comparison(char *str1, char *str2)
{
/* LDR R5, =byteArray */
static BYTE byteArray[] = {0, 1, ..., 0xff};

while(1)
{
/* CMP R4, #0 */
if (*str1 == '\0')
break;

/* LDRB R3, [R1] */
/* LDRB R4, [R3,R6] */
/* LDRB R3, [R5,R6] */
/* CMP R3, R4 */
if (byteArray[*str1] != byteArray[*str2])
break;

++str1;         /* ADDS R0, #1 */
++str2;         /* ADDS R1, #1 */
}

/* LDRB R2, [R3,R5] */
/* LDRB R3, [R3,R5] */
/* SUBS R0, R3, R2 */
return byteArray[*str1] - byteArray[*str2];
}```

### Practical Reverse Engineering p. 79 #8

Question number 8 on page 79 of Practical Reverse Engineering is as follows:

In Figure 2-14, byteArray is a 256-character array whose content is byteArray[] = {0, 1, ..., 0xff}.

Here is the disassembly of the function:

 Figure 2-14. Practical Reverse Engineering. © 2014 by Bruce Dang

The ARM processor is in Thumb state. We infer that this function takes two strings, as there is byte comparisons and the null byte ends the main loop. There is a 3rd argument which can terminate the main loop early as well. The function is a pretty straightforward translation to C.

```int32_t comparison(char *str1, char *str2, uint32_t count)
{
/* LDR R6, =byteArray */
static BYTE byteArray[] = {0, 1, ..., 0xff};

/* CMP R2, #0 */
while (count > 0)
{
--count;    /* SUBS R2, #1 */

/* LDRB R5, [R0] */
/* CBZ R5, loc_100E352 */
if (*str1 == '\0')
break;

/* LDRB R3, [R1] */
/* LDRB R4, [R3,R6] */
/* LDRB R3, [R5,R6] */
/* CMP R3, R4 */
if (byteArray[*str1] != byteArray[*str2])
break;

++str1;     /* ADDS R0, #1 */
++str2;     /* ADDS R1, #1 */
}

/* SUBS R2, #1 */
/* CMP R2, #0 */
if ((count - 1) >= 0)
{
/* LDRB R2, [R3,R6] */
/* LDRB R3, [R3,R6] */
/* SUBS R0, R3, R2 */
return byteArray[*str1] - byteArray[*str2];
}

return NULL;        /* MOVS R0, #0 */
}```

## Monday, January 12, 2015

### Practical Reverse Engineering p. 79 #7

Question number 7 on page 79 of Practical Reverse Engineering is as follows:

Figure 2-13 illustrates a common routine, but you may not have seen it implemented this way.

Here is the disassembly of the function:

 Figure 2-13. Practical Reverse Engineering. © 2014 by Bruce Dang

The ARM processor is in Thumb state. We immediately recognize this is a strlen() routine. There is a bit field clear at the end, whose purpose is unclear. Here is how the function is implemented:

```size_t strlen(const char *str)
{
/* CBNZ R0, loc_100E1D8 */
if (r0 == NULL)
return 0;   /* MOVS R0, #0 */

/* MOV R2, R0 */
char *index = str;

while (1)               /* loc_100E1E4 */
{
/* CMB R3, #0 */
if (*index == '\0')
break;

++index;            /* ADDS R2, #1 */
}

/* SUBS R0, R2, R0 */
return (index - str);
}```

### Practical Reverse Engineering p. 79 #6

Question number 6 on page 79 of Practical Reverse Engineering is as follows:

Figure 2-12 involves some twiddling.

Here is a disassembly of the function:

 Figure 2-12. Practical Reverse Engineering. © 2014 by Bruce Dang

The ARM processor is in Thumb state. The function takes a struct that has a size value and array in it. The array is enumerated looking for a search value, then returning a type of bitmask on its location.

```uint64_t search_mask(struct *r0, DWORD search)
{
/* loc_103B3A8 */
for (
DWORD i = 0;            /* MOVS R2, #0 */
i < r0->numElements;    /* CMP R2, R4 */
++i;                    /* ADDS R2, #1 */
)
{
/* LDR.W R3, [R0,#4]! */
/* CMP R3, R1 */
if (r0->elements[i] == search)
{
/* SUBS.W R3, R2, #0X20 */
/* LSLS R1, R3 */
search = 1 << (i - 0x20);

/* MOVS R3, #1 */
/* LSLS.W R0, R3, R2 */
return (uint64_t) 1 << i;
}

}

search = 0; /* MOVS R1, #0 */
return 0;   /* MOVS R0, #0 */
}```

Here is a struct definition:

```struct r0
{
DWORD numElements;  /* 0x0 */
DWORD elements[?];  /* 0x4 */
}```

### Practical Reverse Engineering p. 79 #5

Question number 5 on page 79 of Practical Reverse Engineering is as follows:

Figure 2-11 is simple as well. The actual string names have been removed so you cannot cheat by searching the Internet.

Here is the disassembly of the function:

 Figure 2-11. Practical Reverse Engineering. © 2014 by Bruce Dang

The ARM processor is in Thumb state. This function can be written as a switch statement. It essentially takes an enum and returns a string based on the value.

```const char *get_string(DWORD string_enum)
{
/* MOV R3, R0 */
switch (string_enum)
{
case 6:             /* CMP R3, #6 */
return "E";     /* LDR R0, =aE ; "E" */

case 7:             /* CMP R3, #7 */
return "D";     /* LDR R0, =aD ; "D" */

case 8:             /* CMP R3, #8 */
return "C";     /* LDR R0, =ac ; "C" */

case 9:             /* CMP R3, #9 */
return "B";     /* LDR R0, =aB ; "B" */

default:
return "A";     /* LDR R0, =aA ; "A" */
}
}```

### Practical Reverse Engineering p. 79 #4

Question number 4 on page 79 of Practical Reverse Engineering is as follows:

Figure 2-10 shows another easy function.

Here is the function's disassembly:

 Figure 2-10. Practical Reverse Engineering. © 2014 by Bruce Dang

The ARM processor is in Thumb state. This function takes a pointer of an unknown type. If the pointer is null, it returns 0. Otherwise it subtracts 8 bytes from the pointer and returns the value there.

```DWORD sub8_ptr(void *r0)
{
/* CBNZ R0, loc_100C3DA */
if (r0 == NULL)
return 0;       /* MOVS R0, #0 */

return *(r0 - 8);   /* LDR.W R0, [R0,#–8]  */
}```

### Practical Reverse Engineering p. 79 #3

Question number 3 on page 79 of Practical Reverse Engineering is as follows:

Here is a simple function:

01: mystery3
02: 83 68 LDR R3, [R0,#8]
03: 0B 60 STR R3, [R1]
04: C3 68 LDR R3, [R0,#0xC]
05: 00 20 MOVS R0, #0
06: 4B 60 STR R3, [R1,#4]
07: 70 47 BX LR
08: ; End of function mystery3

The ARM processor is in Thumb state. This function just copies some values from the first struct to the second struct. It always returns 0.

```int copy_struct_values(struct *r0, struct *r1)
{
/* LDR R3, [R0,#8] */
/* STR R3, [R1] */
r1->Unknown0x0 = r0->Unknown0x8;

/* LDR R3, [R0,#0xC] */
/* STR R3, [R1,#4] */
r1->Unknown0x4 = r0->Unknown0xc;

return 0;   /* MOVS R0, #0 */
}```

Here are the struct definitions:

```struct r0
{
BYTE Unknown0x0[0x8];   /* 0x0 */
DWORD Unknown0x8;       /* 0x8 */
DWORD Unknown0xc;       /* 0xc */
}

struct r1
{
DWORD Unknown0x0;   /* 0x0 */
DWORD Unknown0x4;   /* 0x4 */
}```

### Practical Reverse Engineering p. 78 #2

Question number 2 on page 78 of Practical Reverse Engineering is as follows:

Figure 2-9 shows a function that was found in the export table.

Here is the function's disassembly:

Figure 2-9. Practical Reverse Engineering. © 2014 by Bruce Dang

We can immediately tell the ARM processor should be in Thumb state. This function checks if the struct passed is NULL. If it is not, then one of the members is checked to see if the byte equals 0.

```BOOL check_struct_char(struct *r0)
{
/* CBZ R0, loc_C672 */
if (r0 != NULL)
{
/* LDRB.W R0, [R0,#0x63] */
/* SUBS R0, #0 */
if (r0->Unknown0x63 == 0)
return FALSE;
}

return TRUE;           /* MOVS R0, #1 */
}```

And here is a definition of the passed struct:

```struct r0
{
BYTE Unknown0x0[0x63];  /* 0x0 */
CHAR Unknown0x63;       /* 0x63 */
}```

### Practical Reverse Engineering p. 78 #1

Question number 1 on page 78 of Practical Reverse Engineering is as follows:

Figure 2-8 shows a function that takes two arguments. It may seem somewhat challenging at fi rst, but its functionality is very common. Have patience.

Indeed the disassembly does appear to be a reasonably complex function.

 Figure 2-8. Practical Reverse Engineering. © 2014 by Bruce Dang

This function is not in Thumb state as every instruction is 32-bits in width. It attempts to convert an ASCII string to a signed decimal number. It is very lenient as far as input (even accepting non-numerical characters in the string input), however it will not work on strings exceeding the MAX_INT constant of 2147483648 (0x80000000).

Here is a close 1 to 1 approximation from ARM to C:

```BOOL ascii_to_decimal(const char *str, int32_t *retNum)
{
BOOL negative;
int i = 0;              /* LDRB R3, [R0] */

/* CMP R3, #0x2D */
if  (str[i] == '-')
{
++i;                /* LDRB R3, [R0,#1]! */
negative = TRUE;    /* MOV R6, #1 */
}
else
{
negative = FALSE;   /* MOV R6, #0 */

/* CMP R3, #0x2B */
if (str[i] == '+')
++i;            /* LDREQB R3, [R0,#1]! */
}

/* CMP R3, #0x30 */
if (str[i] == '0')
{
/* CMP R2, #0x30 */
while (str[i] == '0')
++i;            /* LDRB R2, [R3],#1 */
}

const int base = 10;    /* MOV R8, #0xA */
int64_t result = 0;

while (TRUE)
{
result *= base;         /* UMULL R2, R3, R4, R8 */
result += str[i] - '0'; /* ADDS R4, R2, R7 */
++i;                    /* ADD R12, R12, #1 */

/* SUBS R7, R7, #0x30 */
if (str[i] < '0')
break;

/* CMP R7, #9 */
if (str[i] > '9')
break;
}

/* CMP R2, #0x80000000 */
if (abs((int32_t) result) > INT_MAX)
return FALSE;               /* MOV R0, #0 */

*retNum = (int32_t) result;     /* STR R4, [R1] */
return TRUE;                    /* MOV R0, #1 */
}
```

## Saturday, January 10, 2015

### Practical Reverse Engineering p. 38 #2

Question number 2 on page 38 of Practical Reverse Engineering is as follows:

Perform a virtual-to-physical address translation on x64. Were there any major differences compared to x86?

To convert between a virtual to a physical address you must obtain the page frame number of the directory base. You add this offset to the beginning of the page address.

You can do this with a kernel debugger with the following commands. Here is an example with virtual address 0xff1a0000:

lkd> !process 0 0
PROCESS fffffba002ec0330
SessionId: 1 Cid: 04fc Peb: 7fffffdf000 ParentCid: 07a4
DirBase: 1f79b000 ObjectTable: fffff8a001a7b410 HandleCount: 6.
Image: test64.exe

lkd> !vtop 1f79b000 00000000ff1a0000
Amd64VtoP: Virt 00000000`ff460000, pagedir 1f79b000
Amd64VtoP: PML4E 1f79b000
Amd64VtoP: PDPE 2`21f70018
Amd64VtoP: PDE fa007fa0
Amd64VtoP: PTE 1a40c200
Amd64VtoP: Mapped phys 3a021000

There are slightly different outputs between x64 and x86, but the general means of calculating a physical address from the virtual address is the same.

### Practical Reverse Engineering p. 38 #1

Question number 1 on page 38 of Practical Reverse Engineering is as follows:

Explain two methods to get the instruction pointer on x64. At least one of the methods must use RIP addressing.

RIP-relative addressing in x64 makes position independent code much easier to accomplish. Here is an example to get the current instruction pointer:

```_getrip:
lea rax, [rel _getrip + 0x7]
; rax now points to this line```

By adding the offset 0x7, the effective address will be the line after the load into RAX. If you are able to use null bytes, this can also be accomplished directly using the following method:

```    lea rax, [rel _getrip]
_getrip:
; rax now points to this line```

You can also do a basic stack manipulation:

```    call _getrip
_getrip:
pop rax
```

### Practical Reverse Engineering p. 36 #12

Question number 12 on page 36 of Practical Reverse Engineering is as follows:

Bruce's favorite x86/x64 disassembly library is BeaEngine by BeatriX (www.beaengine.org). Experiment with it by writing a program to disassemble a binary at its entry point.

This program is called the Burdensome Arbitrary Disassembler (BAD). You must supply the file name and address of the entry point as arguments. Parsing PE and ELF files is pretty trivial, but then it wouldn't be BAD.

```#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>

#define BEA_USE_STDCALL

void *read_file(void *dest, char *file_name, long *file_size)
{
FILE *fp;

fp = fopen(file_name, "rb");
fseek(fp, 0, SEEK_END);

*file_size = ftell(fp);
dest = malloc(*file_size);

rewind(fp);
fclose(fp);

return dest;
}

void disassemble(char *file_name, uint64_t offset)
{
DISASM dsBea;
uint16_t len = 0;
uint8_t *pe_file = 0;
long file_size;

memset(&dsBea, 0, sizeof(DISASM));
dsBea.Archi = 64;

file_size -= (long) offset;
dsBea.EIP = (uint64_t) &*&*(pe_file + offset);

while(file_size > 0)
{
len = Disasm(&dsBea);

if (len == UNKNOWN_OPCODE)
break;

dsBea.EIP += (uint64_t) len;
file_size -= (long) len;

puts(dsBea.CompleteInstr);
}

free(pe_file);
}

int main(int argc, char *argv[])
{
disassemble(argv[1], atoi(argv[2]));
return 0;
}```

Here is an example of disassembly:

```sub rsp, 28h
call 0047BCB0h
call 0047AB50h
ret
...```

This matches IDA's output (besides mapping address offsets):

### Practical Reverse Engineering p. 36 #11

Question number 11 on page 36 of Practical Reverse Engineering is as follows:

Read the Virtual Memory chapter in Intel Software Developer Manual, Volume 3 and AMD64 Architecture Programmer’s Manual, Volume 2: System Programming. Perform a few virtual address to physical address translations yourself and verify the result with a kernel debugger. Explain how data execution prevention (DEP) works.

### Converting Virtual to Physical Memory

To convert a virtual address to its physical address you must obtain the page frame number of the directory base. You add the offset to the beginning of the page address.

You can do this with a kernel debugger with the following commands. Here is an example starting with virtual address 0x00a03ffd:

kd> !process 0 0
DirBase: 08cb1000 ObjectTable: 5b74db90 TableSize: 8.
Image: test.exe

kd> !vtop 8cb1 a03ffd
Pdi 0 Pti a03
00a03ffd 07a2f000 pfn(07a2f)

Add the offset to the base, 0x07a2f000 + 0xffd = 0x07a2fffd. So virtual address 0x00a03ffd in this process points to physical address 0x07a2fffd.

### Data Execution Prevention

Hardware DEP works by enabling the NX/XD/XN-bit in the CPU (depending on architecture). This allows the operating system to mark pages of memory as non-executable. There is an "executable" flag in a memory segment's page table entry (page descriptor) that allows you to modify this setting.

If the instruction pointer moves to an area marked as DEP, the processor faults and execution is passed to a given handler. A DEP access violation in kernel mode on Windows will result in an error 0xFC: ATTEMPTED_EXECUTE_OF_NOEXECUTE_MEMORY.

## Wednesday, January 7, 2015

### Practical Reverse Engineering p. 36 #10

Question number 10 on page 36 of Practical Reverse Engineering is as follows:

If the current privilege level is encoded in CS, which is modifiable by user-mode code, why can’t user-mode code modify CS to change CPL?

The Code Segment (CS) contains a 2-bit value known as the Code Privilege Level (CPL), which is a cached value of the current Ring mode.

The reason user-mode cannot directly modify CS:CPL is the same reason it cannot directly access the instruction pointer (CS:EIP). It's true that CS:EIP is changed with CALL, RET, and JMP instructions; but it cannot be modified by directly loading a value to the register. The same is true for CS:CPL.

To transition to a higher privilege, an instruction like SYSCALL or INT should be used. These will trigger handlers running in kernel-mode.  To modify the CS:CPL to user-mode, an instruction of the IRET family should be used.

Ring 0 code has more privileges to modify the CS segment than Ring 3 code does. However, modifying CS in both x86 and x64 can cause system instability, and should only be performed in real mode.

### Practical Reverse Engineering p. 35 #9

Question number 9 on page 35 of Practical Reverse Engineering is as follows:

Sample L. Explain what function sub_1000CEA0 does and then decompile it back to C.

Here is the disassembly of the function:

```sub_1000CEA0:
push    ebp
mov     ebp, esp
push    edi
mov     edi, [ebp+8]
xor     eax, eax
or      ecx, 0FFFFFFFFh
repne scasb
neg     ecx
sub     edi, 1
mov     al, [ebp+0Ch]
std
repne scasb
cmp     [edi], al
jz      short loc_1000CEC7
xor     eax, eax
jmp     short loc_1000CEC9

loc_1000CEC7:
mov     eax, edi

loc_1000CEC9:
cld
pop     edi
leave
retn```

This function calculates the string length, then works backwards to return a pointer to the last instance of a character in the string.

```char *sub_1000CEA0(char *str, char ch)
{
/* mov edi, [ebp+8] */
/* xor eax, eax
/* or ecx, 0FFFFFFFFh */
/* repne scasb */
/* neg ecx */
size_t len = 1;
while (*str)
{
++str;
++len;
}

/* sub edi, 1 */
/* mov al, [ebp+0Ch] */
/* std */
/* repne scasb */
while (len)
{
/* cmp [edi], al */
if (*str == ch)
return str;
--len;
--str;
}

return 0;
}```

This is an implementation of the strrchr() function, which is defined as follows:

`char *strrchr(char *str, int character);`

### Practical Reverse Engineering p. 35 #8

Question number 8 on page 35 of Practical Reverse Engineering is as follows:

Sample H. Decompile sub_11732 and explain the most likely programming construct used in the original code.

The function's disassembly looks like the following:

```sub_1172E:
push    esi
mov     esi, [esp+8]
dec     esi
jz      short loc_1175F
dec     esi
jz      short loc_11755
dec     esi
jz      short loc_1174B
sub     esi, 9
jnz     short loc_1176B
mov     esi, [eax+8]
shr     esi, 1
jmp     short loc_11767

loc_1174B:
mov     esi, [eax+3Ch]
shr     esi, 1
jmp     short loc_11767

loc_11755:
mov     esi, [eax+3Ch]
shr     esi, 1
jmp     short loc_11767

loc_1175F:
mov     esi, [eax+3Ch]
shr     esi, 1

loc_11767:

mov     [ecx], esi
mov     [edx], eax

loc_1176B:
pop     esi
retn    4 ```

The function contains a switch. The calling convention is odd for x86 as there are three variables passed in a register, which means it was probably compiled as a static unit. There is a lot of repeated code, which means either the programmer or the compiler couldn't optimize the size in a logical way. Here is a more natural way to write the function:

```static struct *sub_1172E(struct *arg1, struct *arg2,
struct *arg3, int unknown_enum)
{
DWORD dwUnknown;

/* mov esi, [esp+8] */
switch (unknown_enum)
{
case 1:                 /* dec esi */
arg1 += 64;         /* add eax, 40h */
break;

case 2:                 /* dec esi */
arg1 += 68          /* add eax, 44h */
break;

case 3:                 /* dec esi */
arg1 += 94;         /* add eax, 5Eh */
break;

case 12:                /* sub esi, 9 */
arg1 += 12;         /* add eax, 0Ch */
break;

default:
return arg1;
}

dwUnknown = (unknown_enum == 12) ?
arg1->Unknown0x8 :  /* mov esi, [eax+8] */
arg1->Unknown0x3c;  /* mov esi, [eax+3Ch] */

dwUnknown /= 2;             /* shr esi, 1 */

*arg2 = dwUnknown;          /* mov [ecx], esi */
*arg3 = arg1;               /* mov [edx], eax */

return arg1;
}```

We can infer some of the struct in the first argument by where offsets were accessed:

```struct arg1 {
BYTE Unknown0x0[0x8];   /* 0x0 */
DWORD bigTwiceVal;      /* 0x8 */
BYTE Unknown0xc[0x30];  /* 0xc */
DWORD smallTwiceVal;    /* 0x3c */
};
```

## Tuesday, January 6, 2015

### Practical Reverse Engineering p. 35 #7

Question number 7 on page 35 of Practical Reverse Engineering is as follows:

Sample H. The function sub_10BB6 has a loop searching for something. First recover the function prototype and then infer the types based on the context. Hint: You should probably have a copy of the PE specification nearby.

Here is an image that details the PE file format, which is useful for referencing offsets.

The structs that make up the PE file format are:

```typedef struct _IMAGE_DOS_HEADER
{
WORD e_magic;              /* 0x0 */
WORD e_cblp;               /* 0x2 */
WORD e_cp;                 /* 0x4 */
WORD e_crlc;               /* 0x6 */
WORD e_cparhdr;            /* 0x8 */
WORD e_minalloc;           /* 0xa */
WORD e_maxalloc;           /* 0xc */
WORD e_ss;                 /* 0xe */
WORD e_sp;                 /* 0x10 */
WORD e_csum;               /* 0x12 */
WORD e_ip;                 /* 0x14 */
WORD e_cs;                 /* 0x16 */
WORD e_lfarlc;             /* 0x18 */
WORD e_ovno;               /* 0x1a */
WORD e_res[4];             /* 0x1c */
WORD e_oemid;              /* 0x24 */
WORD e_oeminfo;            /* 0x26 */
WORD e_res2[10];           /* 0x28 */
LONG e_lfanew;             /* 0x3c */

DWORD                 Signature;         /* 0x0 */
union
{
struct
{
WORD  Machine;                   /* 0x4 */
WORD  NumberOfSections;          /* 0x6 */
DWORD TimeDateStamp;             /* 0x8 */
DWORD PointerToSymbolTable;      /* 0xc */
DWORD NumberOfSymbols;           /* 0x10 */
WORD  Characteristics;           /* 0x16 */
}
}

Here is the disassembly of the function in Sample H:

```sub_10BB2:
mov     eax, [esp+4]
push    ebx
push    esi
mov     esi, [eax+3Ch]
movzx   eax, word ptr [esi+14h]
xor     ebx, ebx
cmp     [esi+6], bx
push    edi
lea     edi, [eax+esi+18h]
jbe     short loc_10BEB

loc_10BCE:
push    [esp+0Ch+8]     ; _DWORD
push    edi             ; _DWORD
call    ds:dword_169A4
test    eax, eax
pop     ecx
pop     ecx
jz      short loc_10BF3
movzx   eax, word ptr [esi+6]
inc     ebx
cmp     ebx, eax
jb      short loc_10BCE

loc_10BEB:
xor     eax, eax

loc_10BED:
pop     edi
pop     esi
pop     ebx
retn    8

loc_10BF3:
mov     eax, edi
jmp     short loc_10BED```

Making the assumption that our first argument is a pointer to the DOS header (from the hint in the question), we can use a little bit of math to calculate the offsets and see everything lines up. The search code could be written in a for loop but I used a while loop for more clarity.

```PIMAGE_SECTION_HEADER sub_10BB2(PVOID pPE, PVOID arg2)
{
WORD dwOptHdrSize;
DWORD dwCurrentSection;

/* mov eax, [esp+4] */

/* mov esi, [eax+3Ch] */
pNT = (pDOS + pDOS->e_lfanew);

/* movzx eax, word ptr [esi+14h] */

/* xor ebx, ebx */
dwCurrentSection = 0;

/* cmp [esi+6], bx */
if (pNT->NumberOfSections == 0)
return NULL;

/* lea edi, [eax+esi+18h] */
pSection = IMAGE_FIRST_SECTION(pNT);

while (1)
{
/* push [esp+0Ch+8] */
/* push edi  */
/* call ds:dword_169A4 */
/* test eax, eax */
if (*(BOOL)dword_169A4)(pSection, arg2))
return pSection;

++pSection; /* struct increment */

/* inc ebx */
++dwCurrentSection;

/* movzx eax, word ptr [esi+6] */
/* cmp ebx, eax */
if (dwCurrentSection > pNT->NumberOfSections)
return NULL;
}
}```

### Practical Reverse Engineering p. 35 #6

Question number 6 on page 35 of Practical Reverse Engineering is as follows:

Sample H. The function sub_13846 references several structures whose types are not entirely clear. Your task is to first recover the function prototype and then try to reconstruct the structure fields. After reading Chapter 3, return to this exercise to see if your understanding has changed. (Note: This sample is targeting Windows XP x86.)

The function disassembles to:

```sub_13842:
mov     eax, [ecx+60h]
push    esi
mov     esi, [edx+8]
dec     byte ptr [ecx+23h]
sub     eax, 24h
mov     [ecx+60h], eax
mov     [eax+14h], edx
movzx   eax, byte ptr [eax]
push    ecx
push    edx
call    dword ptr [esi+eax*4+38h]
pop     esi
retn```

It's a fastcall convention which has two struct pointer arguments, and the return type is not readily available. There are 3 locals which are stored in registers, two of them pointers within the struct and a char which is an index to a function call table.

```PVOID __fastcall sub_13841(
struct *arg1, struct *arg2)
{
ULONG_PTR v1, v2;
CHAR index;

v1 = arg1->Unkown0x60;    /* mov eax, [ecx+60h] */
v2 = arg2->Unknown0x8;    /* mov esi, [edx+8] */

--arg1->Unknown0x23;      /* dec byte ptr [ecx+23h] */

v1 -= 0x24;               /* sub eax, 24h */

arg1->Unknown0x60 = v1;   /* mov [ecx+60h], eax */
v1->Unknown0x14 = arg2;   /* mov [eax+14h], edx */

index = v1->index;        /* movzx eax, byte ptr [eax] */

/* call dword ptr [esi+eax*4+38h] */
return v2->Unknown0x38[index](arg2, arg1);
}```

Here is a basic idea so far on the struct meanings:

```struct arg1 {
BYTE Unknown0x0[0x23];   /* 0x0 */
CHAR decremented;        /* 0x23 */
BYTE Unknown0x24[0x18];  /* 0x24 */
union
{
struct *v1[9];           /* 0x3c - 0x60 */
struct
{
struct v1_1[0x3c];   /* 0x3c */
struct v1_2[0x60];   /* 0x60 */
};
};
};

struct arg2 {
BYTE Unknown0x0[0x8];    /* 0x0 */
struct *v2;              /* 0x8 */
};

/* size = 0x24 */
struct v1 {
CHAR index;              /* 0x0 */
BYTE Unknown0x1[0x13];   /* 0x1 */
struct *arg2;            /* 0x14 */
CHAR Unknown0x18[0xc];   /* 0x18 */
};

struct v2 {
BYTE Unknown0x0[0x38];   /* 0x0 */
ULONG_PTR function;      /* 0x38 */
};```

## Saturday, January 3, 2015

### Practical Reverse Engineering p. 35 #5

Question number 5 on page 35 of Practical Reverse Engineering is as follows:

Decompile the following kernel routines in Windows:
• KeInitializeDpc
• KeInitializeApc
• ObFastDereferenceObject (explain its calling convention)
• KeInitializeQueue
• KxWaitForLockChainValid
• KiInitializeTSS
• RtlValidateUnicodeString

### KeInitializeDpc

Inside ntoskrnl.exe, KeInitializeDpc has the following prototype:

```VOID NTAPI KeInitializeDpc(
PRKDPC Dpc,
PKDEFERRED_ROUTINE DeferredRoutine,
PVOID DeferredContext);```

This has a parameter for the KDPC struct, which contains a LIST_ENTRY. These are defined as:

```typedef struct _LIST_ENTRY {
struct _LIST_ENTRY  *Flink;   /* 0x0 */
struct _LIST_ENTRY  *Blink;   /* 0x8 */
} LIST_ENTRY, *PLIST_ENTRY;

typedef struct _KDPC
{
UCHAR Type;                /* 0x0 */
UCHAR Importance;          /* 0x1 */
WORD Number;               /* 0x2 */
BYTE Unknown[4];           /* 0x4 */
LIST_ENTRY DpcListEntry;   /* 0x8 */
PVOID DeferredRoutine;     /* 0x18 */
PVOID DeferredContext;     /* 0x20 */
PVOID SystemArgument1;     /* 0x28 */
PVOID SystemArgument2;     /* 0x30 */
PVOID DpcData;             /* 0x38 */
} KDPC, *PKDPC;```

Here is the disassembly:

```KeInitializeDpc:
xor     eax, eax
mov     dword ptr [rcx], 113h
mov     [rcx+18h], rdx
mov     [rcx+38h], rax
mov     [rcx+10h], rax
mov     [rcx+20h], r8
retn```

The first MOV is an optimization which sets the first 3 variables in the struct, as it sets a dword to 0x113 (0b100010011). Everything else lines up easily enough. Here is the fully decompiled function.

```VOID NTAPI KeInitializeDpc(
PRKDPC Dpc,
PKDEFERRED_ROUTINE DeferredRoutine,
PVOID DeferredContext)
{
Dpc->Type = 19;                       /* mov dword ptr [rcx],113h */
Dpc->Importance = 1;
Dpc->Number = 0;
Dpc->DeferredRoutine = DeferredRoutine; /* mov [rcx+18h], rdx */
Dpc->DpcData = 0;                       /* mov [rcx+38h], rax */
Dpc->DpcListEntry.Blink = 0;            /* mov [rcx+10h], rax */
Dpc->DeferredContext = DeferredContext; /* mov [rcx+20h], r8 */
}```

### KeInitializeApc

Inside ntoskrnl.exe, KeInitializeApc has the following prototype:

```VOID NTAPI KeInitializeApc(
_In_ PKAPC  Apc,
_In_ KAPC_ENVIRONMENT   TargetEnvironment,
_In_ PKKERNEL_ROUT_In_E     KernelRoutine,
_In_Opt_ PKRUNDOWN_ROUT_In_E RundownRoutine ,
_In_ PKNORMAL_ROUT_In_E     NormalRoutine,
_In_ KPROCESSOR_MODE    Mode,
_In_ PVOID  Context);   ```

Here is the KAPC struct with added offsets:

```typedef struct _KAPC
{
UCHAR Type;                /* 0x0 */
UCHAR SpareByte0;          /* 0x1 */
UCHAR Size;                /* 0x2 */
UCHAR SpareByte1;          /* 0x3 */
ULONG SpareLong0;          /* 0x4 */
LIST_ENTRY ApcListEntry;   /* 0x10 */
PVOID KernelRoutine;       /* 0x20 */
PVOID RundownRoutine;      /* 0x28 */
PVOID NormalRoutine;       /* 0x30 */
PVOID NormalContext;       /* 0x38 */
PVOID SystemArgument1;     /* 0x40 */
PVOID SystemArgument2;     /* 0x48 */
CHAR ApcStateIndex;        /* 0x50 */
CHAR ApcMode;              /* 0x51 */
UCHAR Inserted;            /* 0x52 */
} KAPC, *PKAPC;```

And here is the disassembly:

```KeInitializeApc:
mov     byte ptr [rcx], 12h
mov     byte ptr [rcx+2], 58h
cmp     r8d, 2
jz      short loc_1400BAAAF
mov     [rcx+50h], r8b

loc_1400BAA71:
mov     rax, [rsp+28h]
mov     [rcx+8], rdx
xor     edx, edx
mov     [rcx+28h], rax
mov     rax, [rsp+30h]
mov     [rcx+20h], r9
mov     [rcx+30h], rax
test    rax, rax
jnz     short loc_1400BAA9D
mov     [rcx+51h], dl
mov     [rcx+38h], rdx

loc_1400BAA99:
mov     [rcx+52h], dl
retn

loc_1400BAA9D:
mov     al, [rsp+38h]
mov     [rcx+51h], al
mov     rax, [rsp+40h]
mov     [rcx+38h], rax
jmp     short loc_1400BAA99

loc_1400BAAAF:
mov     al, [rdx+242h]
mov     [rcx+50h], al
jmp     short loc_1400BAA71```

This routine contains a couple if statements, but otherwise it's just writing the arguments and some constants to the struct.

```VOID NTAPI KeInitializeApc(
_In_ PKAPC  Apc,
_In_ KAPC_ENVIRONMENT   TargetEnvironment,
_In_ PKKERNEL_ROUT_In_E     KernelRoutine,
_In_Opt_ PKRUNDOWN_ROUT_In_E RundownRoutine ,
_In_ PKNORMAL_ROUT_In_E     NormalRoutine,
_In_ KPROCESSOR_MODE    Mode,
_In_ PVOID  Context)
{
Apc->Type = 0x12;        /* mov byte ptr [rcx], 12h */
Apc->Size = 0x58;        /* mov byte ptr [rcx+2], 58h */

/* cmp r8d, 2 */
if ((DWORD)TargetEnvironment == CurrentApcEnvironment)
Apc->ApcStateIndex = Thread->ApcStateIndex;/* mov [rcx+50h], al */
else
Apc->ApcStateIndex = TargetEnvironment;  /* mov [rcx+50h], r8b */

Apc->RundownRoutine = RundownRoutine;    /* mov [rcx+28h], rax */
Apc->KernelRoutine = KernelRoutine;      /* mov [rcx+20h], r9 */
Apc->NormalRoutine = NormalRoutine;      /* mov [rcx+30h], rax */

/* test rax, rax */
if (NormalRoutine != 0)
{
Apc->ApcMode = Mode;                 /* mov [rcx+51h], al */
Apc->NormalContext = Context;        /* mov [rcx+38h], rax */
}
else
{
Apc->ApcMode = 0;                    /* mov [rcx+51h], dl */
Apc->NormalContext = 0;              /* mov [rcx+38h], rdx */
}

Apc->Inserted = 0;                       /* mov [rcx+52h], dl */
}```

### ObFastDereferenceObject

Inside ntoskrnl.exe, ObFastDereferenceObject has the following prototype:

```void __fastcall ObFastDereferenceObject(
_In_ PEX_FAST_REF FastRef,
_In_ PVOID Object
)```

Here is the struct that is passed in the first argument:

```typedef struct _EX_FAST_REF
{
union
{
PVOID Object;
ULONG RefCnt: 4;
UINT64 RefCnt;
};
} EX_FAST_REF, *PEX_FAST_REF;```

Here is the disassembly, which shows that there are fastcall optimizations on the 1st parameter for certain processors:

```ObFastDereferenceObject:
mov     r9, rcx
prefetchw byte ptr [rcx]
mov     rax, [rcx]
mov     r8, rax
xor     r8, rdx
cmp     r8, 0Fh
jnb     short loc_140062C29

loc_140062C1D:
lea     r8, [rax+1]
lock cmpxchg [r9], r8
jnz     short loc_140062C31
retn

loc_140062C29:
mov     rcx, rdx
jmp     ObfDereferenceObject

loc_140062C31:
mov     rcx, rax
xor     rcx, rdx
cmp     rcx, 0Fh
jb      short loc_140062C1D
jmp     short loc_140062C29```

The function is one big loop that increments the FastRef->Object pointer. There is also a precondition test. If the loop fails, another function is called.

```void __fastcall ObFastDereferenceObject(
_In_ PEX_FAST_REF FastRef,
_In_ PVOID Object
)
{
for (   EX_FAST_REF  a = *FastRef,      /* mov rax, [rcx] */
b = *FastRef;      /* mov r8, rax */
*b->Object ^ Object             /* xor r8, rdx */
<= 0x0F;                        /* cmp rcx, 0Fh */
b->Object = *(a->Object) + 1    /* lea r8, [rax+1] */
)
{
/* lock cmpxchg [r9], r8 */
if (atomic_compare_exchange_strong(FastRef, &a, b));
return;
}
/* mov rcx, rdx */
ObfDereferenceObject(Object);       /* jmp ObfDereferenceObject */
}```

### KeInitializeQueue

Inside ntoskrnl.exe, KeInitializeQueue has the following prototype:

```VOID NTAPI KeInitializeQueue(
_Out_  PRKQUEUE Queue,
_In_   ULONG Count);```

Here are the relevant structs which make up our Queue parameter:

```typedef struct _DISPATCHER_HEADER
{
union
{
struct
{
UCHAR Type;
union
{
UCHAR Abandoned;
UCHAR Absolute;
UCHAR NpxIrql;
UCHAR Signalling;
};
union
{
UCHAR Size;
UCHAR Hand;
};
union
{
UCHAR Inserted;
UCHAR DebugActive;
UCHAR DpcActive;
};
};
LONG Lock;
};
LONG SignalState;

typedef struct _KQUEUE {
ULONG CurrentCount;               /* 0x28 */
ULONG MaximumCount;               /* 0x2c */
} KQUEUE, *PKQUEUE, *RESTRICTED_POINTER PRKQUEUE;```

The disassembly for the function is:

```KeInitializeQueue:
mov     word ptr [rcx], 4
mov     byte ptr [rcx+2], 10h
lea     rax, [rcx+8]
xor     r8d, r8d
mov     [rcx+4], r8d
mov     [rax+8], rax
mov     [rax], rax
lea     rax, [rcx+18h]
mov     [rax+8], rax
mov     [rax], rax
lea     rax, [rcx+30h]
mov     [rax+8], rax
mov     [rax], rax
mov     [rcx+28h], r8d
test    edx, edx
jz      short loc_1400DF8A9
mov     [rcx+2Ch], edx
retn

loc_1400DF8A9:
mov     eax, cs:KeNumberProcessors_0
mov     [rcx+2Ch], eax
retn```

This function is again just basically filling in a struct with some constants.

```VOID NTAPI KeInitializeQueue(
_Out_  PRKQUEUE Queue,
_In_   ULONG Count)
{
Queue->Header.Type = 4;              /* mov word ptr [rcx], 4 */
Queue->Header.Size = 0x10;           /* mov byte ptr [rcx+2], 10h */
Queue->Header.SignalState = 0;       /* mov [rcx+4], r8d */

/* lea rax, [rcx+8] */

/* lea rax, [rcx+18h] */

/* lea rax, [rcx+30h] */

Queue.CurrentCount = 0;

/* test edx, edx */
if (Count == 0)
Queue->MaximumCount = KeNumberProcessors; /* cs:_0 */
else
Queue->MaximumCount = Count;            /* mov [rcx+2Ch], edx */
}```

### KxWaitForLockChainValid

Inside ntoskrnl.exe, KxWaitForLockChainValid has the following prototype:

```PKSPIN_LOCK_QUEUE KxWaitForLockChainValid(
__inout PKSPIN_LOCK_QUEUE LockQueue);```

Here is the definition for the struct parameter:

```typedef struct _KSPIN_LOCK_QUEUE
{
struct _KSPIN_LOCK_QUEUE * volatile Next;
PKSPIN_LOCK volatile Lock;
} KSPIN_LOCK_QUEUE, *PKSPIN_LOCK_QUEUE;```

Here is the disassembly of the function:

```KxWaitForLockChainValid:
mov     [rsp+8], rbx
push    rdi
sub     rsp, 20h
mov     rdi, rcx
xor     ebx, ebx

loc_1400DA7F7:
inc     ebx
jz      loc_14019DCAC

loc_1400DA805:
pause

loc_1400DA807:
mov     rax, [rdi]
test    rax, rax
jz      short loc_1400DA7F7
mov     rbx, [rsp+28h+8]
pop     rdi
retn

loc_14019DCAC:
mov     eax, cs:HvlEnlightenments
test    al, 40h
jz      loc_1400DA805
mov     ecx, ebx
call    HvlNotifyLongSpinWait
nop
jmp     loc_1400DA807```

This is a spinlock implementation. It's interesting that the last label is in a distant memory area. This is usually an indication of an optimization by the compiler that the code is rarely used.

```PKSPIN_LOCK_QUEUE KxWaitForLockChainValid(
__inout PKSPIN_LOCK_QUEUE LockQueue)
{
UINT32 i = 0;  /* xor ebx, ebx */

do             /* loc_1400DA7F7 */
{
++i;       /* inc ebx */

/* test al, 40h */
if (i == HvlLongSpinCountMask && HvlEnlightenments != 0x40))
HvlNotifyLongSpinWait(i);       /* mov ecx, ebx */
else
_mm_pause();                    /* pause */

} while(LockQueue->Next != 0);  /* test rax, rax */
}```

`VOID NTAPI KeReadyThread(_In_ PKTHREAD Thread);`

Here is the disassembly:

```KeReadyThread:
push    rbx
sub     rsp, 20h
mov     rdx, [rcx+0B8h]
mov     rbx, rcx
mov     eax, [rdx+234h]
test    al, 7
jnz     short loc_1400F6684

loc_1400F6676:
mov     rcx, rbx

loc_1400F667E:
pop     rbx
retn

loc_1400F6684:
call    KiInSwapSingleProcess
test    al, al
jnz     short loc_1400F667E
jmp     short loc_1400F6676```

Until I calculate the offsets the struct values are unknown.

```VOID NTAPI KeReadyThread(_In_ PKTHREAD Thread)
{
/* mov rdx, [rcx+0B8h] */
/* mov eax, [rdx+234h] */
/* test al, 7 */
if (KiInSwapSingleProcess(Thread))  /* call KiInSwapSingle */
return;                         /* jnz loc_1400F667E */

}
```

### KiInitializeTSS

Inside ntoskrnl.exe, KiInitializeTSS has the following prototype:

`VOID NTAPI KiInitializeTSS(_In_ PKTSS Tss);`

This has a parameter for the PKTSS struct. It is defined as:

```typedef struct _KTSS
{
WORD Reserved0;
ULONG Esp0;
WORD Ss0;                  /* 0x8 */
WORD Reserved1;
ULONG NotUsed1[4];
ULONG CR3;
ULONG Eip;
ULONG EFlags;
ULONG Eax;
ULONG Ecx;
ULONG Edx;
ULONG Ebx;
ULONG Esp;
ULONG Ebp;
ULONG Esi;
ULONG Edi;
WORD Es;
WORD Reserved2;
WORD Cs;
WORD Reserved3;
WORD Ss;
WORD Reserved4;
WORD Ds;
WORD Reserved5;
WORD Fs;
WORD Reserved6;
WORD Gs;
WORD Reserved7;
WORD LDT;                  /* 0x60 */
WORD Reserved8;
WORD Flags;                /* 0x64 */
WORD IoMapBase;            /* 0x66 */
KiIoAccessMap IoMaps[1];
UCHAR IntDirectionMap[32]; /* 0x208c */
} KTSS, *PKTSS;```

Here is the disassembly:

```KiInitializeTSS:
mov     edi, edi
push    ebp
mov     ebp, esp
mov     eax, dword ptr [ebp+8]
and     word ptr [eax+64h], 0
and     word ptr [eax+60h], 0
mov     word ptr [eax+66h], 20ACh
mov     word ptr [eax+8], 10h
pop     ebp
ret     4```

This function fills in the structure with constants.

```VOID NTAPI KiInitializeTSS(_In_ PKTSS Tss)
{
Tss->Flags = 0;     /* and word ptr [eax+64h], 0 */
Tss->LDT = 0;       /* and word ptr [eax+60h], 0 */

/* mov word ptr [eax+66h], 20ACh */
Tss->IoMapBase = sizeof(KTSS);

Tss->Ss0 = 16;      /* mov word ptr [eax+8], 10h */
}```

### RtlValidateUnicodeString

Inside ntoskrnl.exe, RtlValidateUnicodeString has the following prototype:

```NTSTATUS NTAPI RtlValidateUnicodeString(
_In_ ULONG Flags,
_In_ PCUNICODE_STRING UnicodeString);```

The UNICODE_STRING struct in a 64-bit system context is defined as:

```typedef struct _UNICODE_STRING {
USHORT Length;            /* 0x0 */
USHORT MaximumLength;     /* 0x2 */
DWORD  Reserved;          /* 0x4 */
PWSTR  Buffer;            /* 0x8 */
} UNICODE_STRING, *PUNICODE_STRING;```

Here's the disassembly of the function:

```RtlValidateUnicodeString:
xor     eax, eax
test    ecx, ecx
jnz     short loc_1400D23BB
test    rdx, rdx
jz      short locret_1400D23BA
movzx   r8d, word ptr [rdx]
test    r8b, 1
jnz     short loc_1400D23BB
movzx   ecx, word ptr [rdx+2]
test    cl, 1
jnz     short loc_1400D23BB
cmp     r8w, cx
ja      short loc_1400D23BB
mov     r9d, 0FFFEh
cmp     cx, r9w
ja      short loc_1400D23BB
cmp     [rdx+8], rax
jz      loc_14019BAF4

locret_1400D23BA:
retn

loc_1400D23BB:
mov     eax, 0C000000Dh
retn

loc_14019BAF4:
test    r8w, r8w
jnz     loc_1400D23BB
test    cx, cx
jz      locret_1400D23BA
jmp     loc_1400D23BB```

The function, true to its name, follows the traditional validation pattern of executing tests and returning false (NTSTATUS: INVALID_PARAMETER) or true (NTSTATUS: SUCCESS) depending on if the conditions are met or not. Note that the last test case in the main body of the function can jump to a distant memory space for more tests, an optimization that likely means it is rarely branched to.

```/* test ecx, ecx */
if (Flags != 0)
return STATUS_INVALID_PARAMETER; /* mov eax, 0C000000Dh */

/* test rdx, rdx */
if (!UnicodeString)
return STATUS_SUCCESS;           /* xor eax, eax */

/* movzx r8d, word ptr [rdx] */
/* test r8b, 1 */
if (UnicodeString->Length & 1 != 0)
return STATUS_INVALID_PARAMETER;

/* movzx ecx, word ptr [rdx+2] */
/* test cl, 1 */
if (UnicodeString->MaximumLength & 1 != 0)
return STATUS_INVALID_PARAMETER;

/* cmp r8w, cx */
if (UnicodeString->Length > UnicodeString.MaximumLength)
return STATUS_INVALID_PARAMETER;

/* mov r9d, 0FFFEh */
/* cmp cx, r9w */
if (UnicodeString->MaximumLength > 65534)
return STATUS_INVALID_PARAMETER;

/* cmp [rdx+8], rax */
if (UnicodeString->Buffer == 0)
{
/* test r8w, r8w */
if (UnicodeString->Length != 0)
return STATUS_INVALID_PARAMETER;

/* test cx, cx */
if (UnicodeString->MaximumLength != 0)
return STATUS_INVALID_PARAMETER;
}

return STATUS_SUCCESS;```

## Friday, January 2, 2015

### Practical Reverse Engineering p. 35 #4

Question number 4 on page 35 of Practical Reverse Engineering is as follows:

Implement the following functions in x86 assembly:
• strlen
• strchr
• memcpy
• memset
• strcmp
• strset

Here is the C prototype for strlen:

`size_t strlen(const char *str);`

This function returns the number of characters found before the null-byte character in a C-string. We can set AL to null and loop over EDI using REPNE SCASB. We will set ECX to -1 to begin, and then NOT the bytes and subtract by 1 to get the positive number length result.

```_strlen:
push edi

mov edi, [esp + 0x8]  ; char *str
xor eax, eax          ; al = '\0'
xor ecx, ecx          ; ecx = -1
not ecx

cld
repne scasb

not ecx               ; correct ecx
lea eax, [ecx - 0x1]

pop edi
ret```

Here is the C prototype for strchr:

`char *strchr(char *str, int character);`

The method of searching in the strchr function seems like it would be very similar to strlen. The are two main differences. Instead of searching for the null byte, we are searching for a user passed argument. And instead of returning the length, we return a pointer to the first time the search character is found.

We could be naive and just change some code in strlen, but looping with REPNE here means we could overrun the string buffer into unknown memory space. We need logic to check for the null byte as well, making the function more complicated. We could implement a strlen lookup beforehand, but it is more efficient to revert to more generic looping and comparison constructs.

```_strchr:
mov eax, [esp + 0x4]   ; char *str
mov ecx, [esp + 0x8]   ; char c

chr_loop:
mov dl, [eax]          ; edx is caller-saved

cmp cl, dl             ; *eax == c
je chr_leave

inc eax

test dl, dl            ; check '\0'
jnz chr_loop

xor eax, eax           ; nullptr on fail

chr_leave:
ret```

Here is the C prototype for memcpy:

`void *memcpy (void *destination, const void *source, size_t num);`

This implementation is very convenient in x86. We use the REP MOVSB operation, which will copy ESI to EDI byte by byte until ECX reaches 0.

```_memcpy:
push edi                ; non-volatiles
push esi

mov edi, [esp + 0xc]    ; void *dest
mov esi, [esp + 0x10]   ; const void *src
mov ecx, [esp + 0x14]   ; size_t num

mov eax, edi            ; return dest

cld                     ; DF = 0 (++)
rep movsb               ; ends at ecx = 0

pop esi
pop edi
ret```

Here is the C prototype for memset:

`void *memset (void *ptr, int value, size_t num);`

This implementation is similar to memcpy. We change to the REP STOSB operation, which will copy AL into EDI until ECX reaches 0.

```_memset:
push edi

mov edi, [esp + 0x8]
mov eax, [esp + 0xc]    ; char in al
mov ecx, [esp + 0x10]

push edi                ; store dest

cld
rep stosb               ; ends at ecx = 0

pop eax                 ; return dest

pop edi
ret```

Here is the C prototype for strcmp:

`int strcmp(const char *str1, const char *str2);`

The strcmp function returns 0 if both strings are equal, > 0 if the first character that does not match has a greater value in str1 than in str2, and < 0 in the other case.

For this function we need to change our strategy. The REP CMPSB instruction looks like it would be great, but we would need to precompute both strings lengths and then use the minimum for our ECX value. We can achieve a better implementation using more generic looping and comparison constructs. Even though it seems to be more code, by skipping two strlen calls it ends up being less.

```_strcmp:
push edi
push esi

mov esi, [esp + 0xc]
mov edi, [esp + 0x10]

xor eax, eax    ; clear ret

cmp_loop:
mov al, [esi]
mov cl, [edi]
sub al, cl      ; al = *esi - *edi
jne cmp_leave

test cl, cl     ; check for '\0'
jz cmp_leave

inc esi         ; ++esi, ++edi
inc edi
jmp cmp_loop

cmp_leave:
pop esi
pop edi
ret```

Here is the C prototype for strset:

`char *strset(char *str, int ch);`

This is a function that isn't part of the standard library. We assume it works similar to memset, except that it will stop at the null terminator. There are two strategies we could take: precompute the strlen and then do a memset, or use more generic looping and comparison for a tighter implementation.

```_strset:
push edi

mov edi, [esp + 0x8]
mov ecx, [esp + 0xc]    ; char in ecx

push edi                ; store dest

set_loop:
mov al, [edi]
test al, al             ; check '\0'
jz set_leave

mov [edi], cl           ; *edi = ch
inc edi

jmp set_loop

set_leave:
pop eax                 ; return dest

pop edi
ret```

## Thursday, January 1, 2015

### Practical Reverse Engineering p. 35 #3

Question number 3 on page 35 of Practical Reverse Engineering is as follows:

In some of the assembly listings, the function name has a @ prefix followed by a number. Explain when and why this decoration exists.

This is a compiler ABI decoration used with stdcall. The function name is also prefixed with an underscore. The number after the @ sign tells us the amount of bytes in the parameters.

```BOOL WINAPI DllMain(
_In_  HINSTANCE hinstDLL,
_In_  DWORD fdwReason,
_In_  LPVOID lpvReserved
);```

The C function becomes the following symbol after compilation:

_DllMain@12

### Practical Reverse Engineering p. 35 #2

Question number 2 on page 35 of Practical Reverse Engineering is as follows:

In the example walk-through, we did a nearly one-to-one translation of the assembly code to C. As an exercise, re-decompile this whole function so that it looks more natural. What can you say about the developer’s skill level/experience? Explain your reasons. Can you do a better job?

The function itself is fairly straightforward outside of the intrinsics library, which is basically just anti-VM code. It's hard to judge the competency of the developer in terms of software engineering skills since the compiler can change a lot of structure around. It's obvious he or she at least has a decent understanding of the Windows API, and has an idea about where a virtual machine CPU might store the Interrupt Descriptor Table.

Here is the function reverse engineered to C:

```#include <Windows.h>
#include <tlhelp32.h>
#include <intrin.h>

typedef struct _IDTR {
DWORD base;
SHORT limit;
} IDTR, *PIDTR;

BOOL APIENTRY DllMain(HMODULE hMod, DWORD dwReason, LPVOID lpRes)
{
IDTR idtr;
PROCESSENTRY32 procentry;
HANDLE hToolSnap;
BOOL bProc32;

__sidt(&idtr);

if (idtr.base > 0x8003F400 && idtr.base < 0x80047400)
return FALSE;

memset(&procentry, 0, sizeof(PROCESSENTRY32));
procentry.dwSize = sizeof(procentry);

hToolSnap = CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS, 0);
if (hToolSnap == INVALID_HANDLE_VALUE)
return FALSE;

for (   bProc32 = Process32First(hToolSnap, &procentry);
bProc32 != FALSE;
bProc32 = Process32Next(hToolSnap, &procentry))
{
if (wcscmp(procentry.szExeFile, _T("explorer.exe")) == 0)
break;
}

if (!bProc32)
return FALSE;

if (procentry.th32ParentProcessID == procentry.th32ProcessID)
return FALSE;

if (dwReason == DLL_PROCESS_ATTACH)

return TRUE;
}```

### Practical Reverse Engineering p. 35 #1

Question number 1 on page 35 of Practical Reverse Engineering is as follows:

Repeat the walk-through by yourself. Draw the stack layout, including parameters and local variables.

The function in question is DllMain, which is a WINAPI (stdcall) convention. The stack pointer with regards to function entry looks like the following:

 0x0 &return +0x4 hModule +0x8 dwReason +0xc lpReserved

The function's instructions immediately modify the stack:

```push ebp
mov ebp, esp
sub esp, 130h
push edi
sidt fword ptr [ebp-8]
mov eax, [ebp-6]```

So the stack for local variables becomes:

 -0x138 saved EDI -0x134 reserved -0xc struct IDT -0x4 saved EBP 0x0 &return

The reserved bytes are for a PROCESSENTRY32 struct, which is defined on MSDN as follows:

```typedef struct tagPROCESSENTRY32 {
DWORD     dwSize;
DWORD     cntUsage;
DWORD     th32ProcessID;
ULONG_PTR th32DefaultHeapID;
DWORD     th32ModuleID;