Preface: This post assumes a beginning understanding of x64 assembly, x64 ABI and GNU as syntax.
You can download the full source of the example here
Writing code in assembly compared to higher level languages can be quiet cumbersome. It is pretty common to have a general assortment of helper functions that will make writing full fledged applications in assembly easier. Fundamentals of x64 assembly require us to grasp moving bytes around, how numbers and strings are represented and common programming idioms like loops, recursion, etc.. Dealing with memory and the stack requires us to really delve into it. So let’s start with string comparisons in x64 assembly.
First lets setup the data section. We will need to create a couple strings for comparison and some output to let us know whether the strings are equal or not outside of the debugger:
.set STDOUT, 1
.set __NR_write, 1
# ---------- DATA ----------
.data
# Test strings, try changing them to different strings and re-running
# the program.
str1:
.asciz "hello"
str2:
.asciz "hello"
# Strings to output to STDOUT if equal or not equao
equal:
.asciz "equal"
not_equal:
.asciz "not equal"
I will tend to in larger programs create more equates for system calls for readability, but for the sake of this example I have left that out. Here is our very simple “main” method (our actual entrypoint is _start)
We simplify the call by loading the address to our strings in %rdi and %rsi and our count in %rcx. This way everything is directly loaded into the proper registers for the string instructions.
# ---------- CODE ----------
.globl _start
.text
_start:
# our strn_cmp function requires that we pass in %rdi and %rsi the address
# of the strings we want to compare.
# %rcx will be the length of bytes we want to compare and this will be
# clarified why when we get to the string comparison function.
lea str1, %rdi
lea str2, %rsi
mov $5, %rcx
call strn_cmp
test %rax, %rax # We return 0 if equal -1 if not equal
jnz .L_not_equal
lea equal, %rdi # Print strings are equal
call print_str
.L_exit:
mov $60, %rax
mov $0, %rdi
syscall
.L_not_equal:
lea not_equal, %rdi
call print_str
jmp .L_exit
Don’t worry about the print_str helper function we will cover that last. The point of this post is to focus on the string comparison function and show off two really convenient instructions. There is multiple ways we could return some value to let us know if the string comparison was successful, but I typically like to just return 0 if success and -1 on failure. We could use sub and get which string and index of the byte that was different, but in this example we don’t need that. We just want to know if the strings are equal.
# String comparison utility function with variable length checking
# %rcx is length to check
# %rdi string1
# %rsi string2
# %rax is return
strn_cmp:
cld
repe cmpsb
jne .L_strn_cmp_ne
xor %rax, %rax
ret
.L_strn_cmp_ne:
mov $-1, %rax
ret
Let’s watch how this works in our debugger. Let’s set a breakpoint on strn_cmp before any execution happens for the string comparison.
You can see in %rcx we have 5 which we moved before the call since we will be repeating our comparison of bytes 5 times (the length of the string). You can also very clearly see we have the string addresses of our strings loaded in %rdi and %rsi. If we show the data of where %rsi and %rdi are pointing to we can see below:
So looks like everything is set up correctly. Now going back to the disassembly if we step through the debugger we first execute CLD which clears the direction flag. Why do we set this up?
When the DF flag is set to 0, string operations increment the index registers (ESI and/or EDI).
Our addresses are in %rdi and %rsi so respectively we will increment as repe (repeat while equal) and we add cmpsb (compare bytes) and we can see the dereference to (%rsi) and (%rdi) so lets step over it a single time:
We have a few things that just happened with executing that single line. We can see on the right side in registers %rcx was decremented by 1 and %rsi and %rdi were incremented by 1 to compare the next bytes. This same process will continually happen until %rcx is decremented to 0 or there is 2 bytes that are not equal. Now once we step through until %rcx is 0 lets have a look and see what has changed:
Look to the left and we can see zf (zflag) is set to 1. The strings are equal. So the jump will not happen and we will clear %rax and return to our program to print equal.
You can see we return back and print equal and exit the program. We don’t need to step through the print_str function, but I will copy it below so you can run through the program yourself and test different outputs. Change the strings to something different and look at what happens when the bytes are not equal, what does the zflag show?
print_str:
# Print a string to STDOUT = 1
# %rdi holds the address of the string
#
# We need to find the length of the string first and then print using
# syscall __NR_write (sys_write)
push %rcx
push %rax
push %rdx
xor %rcx, %rcx
.L_strlen:
movb (%rdi, %rcx), %al
test %al, %al
jz .L_write
inc %rcx
jmp .L_strlen
.L_write:
# At this point %rcx holds the length of the null terminated string
mov %rcx, %rdx
mov %rdi, %rsi
mov $STDOUT, %rdi
mov $__NR_write, %rax
syscall
pop %rdx
pop %rax
pop %rcx
ret
A note about the print_str function is that you don’t have to worry about passing in the length, as the first thing we do is utilize gas’s (base, index) addressing to look for the null byte and keep count of the bytes in %rcx.
AT&T Syntax: DISP(BASE, INDEX * SCALE)
This is how it works, our base is the address of the string we are printing and the index starts at 0 which is %rcx. So if %rcx is 2 and we are using the world “hello” and we use movb (move byte) we index into “hello” at the first “l” and move that byte into %al and compare it to 0. We do that same process incrementing %rcx through the whole string until %al is 0 and %rcx will have the length of our string. Then we pass %rcx to __NR_write to output the string.