x86 Assembly Tutorial Part 1 (it’s big I promise...)
Wuru (41)

Hey.

x86 assembly time.

And yes, I know @JustAWalrus made a tutorial on this already but as much as you may or may not be a fan of him I think we can all admit it wasn't great.

I know, I know. "Cycle squeezing" blah blah blah. I don't write here often but I wanted to get something out. I really don't care about the virtual points. For me the upvotes are about who gets to see the tutorial. If it so deeply offends you that I dare release my writing in multiple parts, just click off.

And without further ado, let us begin

Getting setup

To start go here

And delete the already made code.

Computers at a low level

A computer at a low level is just an ALU communicating with memory.

The ALU is a device that can perform arithmetic and logic operations.

The memory contains data and code.

This code is represented as binary(1's and 0's).

The code may look something like this.

0010         -        0101
 ^                    ^
add-1 instruction     memory address

With the first 4 bits(a single 1 or 0) being the operation code or opcode.

The opcode is telling the computer what to do with the rest of the info.

The next 4 bits is the operand.

The operand is data that can be used in the opcode.

Assembly works very similar to this. The first mnemonic is the operand and the following data are the operands.

In assembly we also don't work with variables.

You either work with the stack, registers, pointers, or the heap.

In our journey you will get a newfound understanding of programming. For example, a while loop wont just look like this anymore:

#include <stdio.h>

int main() {
	while (true) { // erm while this statement is true repeat.
		printf("Hello this is a while loop");
	}
	return 0;
}

It will look like this:

#include <stdio.h>

int while_example(bool& condition) {
	if (condition) {
		printf("It's true!");
		while_example(condition);
	} else {
		return 0;
	}
}

int main() {
	bool x = true;
	while_example(x);
	return 0;
}

This tutorial isn't geared at the language but moreover the experience and knowlegdge gained from learning x86 Assembly.

Writing our first x86 Assembly code

Anyway let's start (for real this time).

In the IDE website let's type some code.

As I said earlier, x86 Assembly code doesn't have variables.

I did note, however, that there are registers you can use.

These registers reside in the CPU and they are very fast.

The registers we will be using for now are as follows.

  • eax
  • ebx
  • ecx
  • edx

These registers are just general purposed places to store and manipulate data. They are usually temporary and if you wanted to store something for longer you could use the bss or data section which we will talk about later.

Anyway lets write some code!

mov eax, 8 

Write that in the IDE webiste.

Let us discuss the synatx of that code.

The mov statement is used to say to the computer we want to move some data.

This instruction takes two operands, a location and data.

The data can be a location as well.

In this case we are moving the number 8 into the register eax.

So the syntax is:

mov location, data

Comments

To make a comment we use ;.

So we do it like so:

; this is a comment!

Basic Syscall 0x80

Don't be frightened by the title, this is very easy. At the start in may seem complicated but once you see the bigger picture this will be a peice of cake.

If we write this line of code

int 0x80

What happens?

Well, this line is known as an interrupt and it is how we perform system-centered tasks.

This interrupt takes a paramater stored in eax.

So, if we wrote this

mov eax, 7
int 0x80

Then that would work right?

Wrong.

The system interrupt usually needs data from other registers too.

The value stored in eax is telling this specific interrupt handeler what to do.

In the case we want the interrupt to end the program.

The value we need is 1.

So:

mov eax, 1
int 0x80

Would kind of work.

But this syscall takes another argument.

This argument lives in the ebx register.

This argument is the exit code.

0 is the exit code for success.

So, you could think of this as a function definition and calling like this.

int syscall_exit(ebx) {
  doSomething();
}

exit(0);

So this code:

mov eax, 1
mov ebx, 0
int 0x80

Would work.

Labels

So, until now your code hasn't worked correctly.

This is due to the fact that we don't have an entry point or an _start symbol.

The _start symbol is needed to start our code.

So, we write this sytax:

<label_name>:

So for our purposes the label_name is _start.

So we write:

_start:
  <code>

How about we paste our old code in there?

_start:
  mov eax, 1
  mov ebx, 0
  int 0x80

It still won't work correctly!

We need to expose the label to the linker (a thing in the process of getting our executable.

This makes our label global so the linker can see it.

So we do this syntax.

global <symbol>

Which in our case symbol is _start.

And we do:

global _start

_start:
  mov eax, 1
  mov ebx, 0
  int 0x80

The text and data section.

So far all of our code would be reffered to as text section code.

Let us outline the differences betweeen the text and data section:

Sections:

  • text
    • Stores code
    • Labels
    • Syscalls
    • Move statements
  • data
    • stores data
    • defining bytes
    • memory
    • pointers

Ok, so to define a section we use the syntax:

section <name>

In our first case the name is .text.

Let us add this to our old code.

section .text

global _start

_start:
  mov eax, 1
  mov ebx, 0
  int 0x80

Everything below the section definition is now part of that section.

Ok, let us test out the .data section.

section .text

global _start

_start:
  mov eax, 1
  mov ebx, 0
  int 0x80

section .data
  ; now what?

Let us define a byte.

To define a byte we use the syntax <name> db <data>.

So how about we define a byte called x with a value of 99 and try to exit our program with x.

Ok, let us define our byte.

section .text

global _start

_start:
  mov eax, 1
  mov ebx, 0
  int 0x80

section .data
  x db 99

Ok, now we exit our program with x.

section .text

global _start

_start:
  mov eax, 1
  mov ebx, x
  int 0x80

section .data
  x db 99

Ok, what is going on here?

Well like we said we defined x with x db 99.

We then did the standard exit program with instead of our 0 we put the value stored at the memory location x is reffering to with mov ebx, x.

And there!

Hello, World!

Ok, let us use our knowledge we have gained thus far and do a "Hello, World!" program!

So, firstly we need to talk about int 0x80 again.

Like I said, this is a syscall.

Meaning the operating system handles it.

So, we need to figure out which one allows us to write to the screen.

Firstly, let me teach you about stdout.

stdout is a buffer used by the system to handle output to the terminal.

So what we need to do is write to that buffer.

The syscall we just did with the code of 1 was the sys_exit syscall.

The syscall we need is sys_write.

The code for this syscall is 4 and it takes the arguments of all of the registers you have learned so far.

The functions of these registers in sys_write are as follows.

  • eax: the code (4)
  • ebx: file descriptor (in our case 1 for stdout)
  • ecx: data
  • edx: data size

Ok let us go over that a bit more.

The file descriptor or ebx is just a little bit of information about the file, we will use this when we do file I/O. In our case, the operating system knows that 1 is stdout.

The data is the ASCII output we want. This could be an ASCII string we define, a integer, or a hexadecimal value. In our case we will be using an ASCII string we define.

The data size is how big our data is in bytes. Since one ASCII character is 1 byte we can just count the amount of characters if we want to, however, we will be using a different way.

Ok, let's start by defining our string and movibg the first data.

section .text

global _start

_start:
  mov eax, 4
  mov ebx, 1
  mov ecx, message
  
section .data
  message db "Hello, World!"

Ok, you might think we are done with that part, but you'd be wrong.

You see we also want a newline on the end.

The problem is assembly doesn't have a \n or a std::endl like some languages.

In assembly we have to reference the ASCII code for newline.

That would be 10, but we are going to reference it in in hexadecimal with 0x0a.

So we are just going to tack this on the end like this:

section .text

global _start

_start:
  mov eax, 4
  mov ebx, 1
  mov ecx, message
  
section .data
  message db "Hello, World!", 0x0a

And there we go!

We defined our string, we moved it into ecx and we now have a newline!

Now let's calculate the length.

We are assembling with NASM behind the scenes and NASM has a macro for doing so.

This macro is:

$-
; after the - put the name of your data.

So in our case it is:

$-message

Now we need to make a data pointer equal to this.

So we use equ!

The syntax is:

<name> equ <data>

It's very similar to db in use.

So we can write:

section .text

global _start

_start:
  mov eax, 4
  mov ebx, 1
  mov ecx, message
  mov edx, message_len
  
section .data
  message db "Hello, World!", 0x0a
  message_len equ $-message

Ok, now we defined our length and put the length inside of edx.

Now, let us print using int 0x80

section .text

global _start

_start:
  mov eax, 4
  mov ebx, 1
  mov ecx, message
  mov edx, message_len
  int 0x80
  
section .data
  message db "Hello, World!", 0x0a
  message_len equ $-message

And, we are done!

I'd reccomend just as good practice we exit our code with sys_exit.

If you want to do that just tack:

mov eax, 1
mov ebx, 0
int 0x80

On the end of _start.

Conclusion.

Ok, you have started your journey of x86 Assembly!

There will be more parts coming soon.

If you liked this and want more people to see it share it with your friends and or upvote it.

You are viewing a single comment. View All
AphixDev (216)

Very well done! 👏👏👏