Strings in Assembly

CS 301 Lecture, Dr. Lawlor

Constant Strings

The bottom line is a C string is just a region of memory with some ASCII characters in it.  One ASCII character is one byte, and a zero byte indicates the end of the string.  So any way you can create bytes with known values, you can create strings too.  Here we're using the handy C function puts to print the string onto the screen.
extern puts

mov rdi,the_secret_message
call puts ; write our string to the screen

ret

the_secret_message:
db 0x59
db 0x75
db 0x70
db 0 ;<- zero byte marks end of string

(Try this in NetRun now!)

That's a pretty atrocious way to write strings, so the assembler supports a bunch of other syntaxes.  These are all equivalent:
	db 0x4D, 0x6F, 0x6F, 0x73, 0x65, 0  (Try this in NetRun now!)
	db 'M', 'o', 'o', 's', 'e', 0  (Try this in NetRun now!)
	db 'Moose', 0  (Try this in NetRun now!)
In the assembler, single and double quotes are interchangable, unlike in C++, where single quote like 'M' is an integral value 0x4D, but double quote "M" is a pointer to a zero-terminated string {0x4D,0}.  Also unlike in C++, "\n" doesn't give a newline, it prints "\n"!  To get an actual newline, you need to use 0xA (this is ASCII "LF", new line).  Don't forget the zero byte to end the string.
	db 'Moose',0xA
db '... and squirrel.',0

(Try this in NetRun now!)

Keep in mind that puts adds a newline at the end of the string.  Call printf to avoid the newline.

Static Bytes & Strings

You can access an individual byte from memory with the syntax BYTE[address].  Most instructions want DWORDs, not BYTEs, so you need to use a BYTE-friendly instruction like  "movzx" (move with zero-extend):
	movzx reg,BYTE[address]
Accessing data as bytes is useful for string processing, or to understand what really shows up in memory.

For example, here I'm defining a short 3-byte string, and reading one byte out:
movzx eax,BYTE[myString + 2] ; read this byte into eax
ret

section .data
myString:
db 'w','o','a'

(Try this in NetRun now!)

These are all equivalent ways to get the same 3-byte string:

db 0x77
db 0x6f
db 0x61
db 'w'
db 'o'
db 'a'
db 'w','o','a'
db 'woa'
db "woa"

There are several standard functions that take a "C string": a pointer to a bunch of ASCII bytes, followed by a zero byte.  "puts" is one such function, and it prints the string you pass it plus a newline. We can call puts to print out our string like this:

mov rdi,myString  ; points to string constant below
extern puts
call puts
ret

section .data
myString:
db 'woa',0 ; need the trailing zero to mark the end of the string...

(Try this in NetRun now!)

Here's an example where we load a byte from the middle of an integer.  Note that this returns 0xa2, since byte 0 is the 0xa0--the little byte--on our little-endian x86 machines.

movzx eax,BYTE[myInt + 2] ; read this byte into eax
ret

section .data
myInt:
dd 0xa3a2a1a0 ; "data DWORD" containing this value

(Try this in NetRun now!)

Variable Strings

There are these handy C functions gets and puts that read or write strings to the screen.  They both take just one argument, a pointer to the string data to read or write.  For example, I can store a modifiable string statically, in "section .data":
extern gets
extern puts
mov rdi,mystring
call gets
mov rdi,mystring
call puts
ret

section .data
mystring:
times 100 db 'v'

(Try this in NetRun now!)

*Or* I can allocate space on the stack to store the string:
extern gets
extern puts

sub rsp,100 ; allocate 100 bytes of stack space

mov rdi,rsp
call gets ; read into our string
mov rdi,rsp
call puts ; write our string to the screen

add rsp,100; give back stack space
ret

(Try this in NetRun now!)

Or I can call "malloc" to allocate space for the string.  I need to use a preserved register to hang onto the allocated pointer; here I'm using r12.
extern gets
extern puts
extern malloc, free

push r12 ; preserve main's copy on the stack

mov rdi,100
call malloc
mov r12,rax ; <- malloc returns the pointer in rax

mov rdi,r12
call gets ; read into our string
mov rdi,r12
call puts ; write our string to the screen

mov rdi,r12
call free ; dispose of our copy of the string

pop r12 ; restore main's copy of this register
ret

(Try this in NetRun now!)