And yes, I know @JustAWalrus made a tutorial on this already but as much as you may or may not be a fan of him I think we can all admit it wasn't great.
I know, I know. "Cycle squeezing" blah blah blah. I don't write here often but I wanted to get something out. I really don't care about the virtual points. For me the upvotes are about who gets to see the tutorial. If it so deeply offends you that I dare release my writing in multiple parts, just click off.
A computer at a low level is just an ALU communicating with memory.
The ALU is a device that can perform arithmetic and logic operations.
The memory contains data and code.
This code is represented as binary(1's and 0's).
The code may look something like this.
0010 - 0101
^ ^
add-1 instruction memory address
With the first 4 bits(a single 1 or 0) being the operation code or opcode.
The opcode is telling the computer what to do with the rest of the info.
The next 4 bits is the operand.
The operand is data that can be used in the opcode.
Assembly works very similar to this. The first mnemonic is the operand and the following data are the operands.
In assembly we also don't work with variables.
You either work with the stack, registers, pointers, or the heap.
In our journey you will get a newfound understanding of programming. For example, a while loop wont just look like this anymore:
#include <stdio.h>
int main() {
while (true) { // erm while this statement is true repeat.
printf("Hello this is a while loop");
}
return 0;
}
It will look like this:
#include <stdio.h>
int while_example(bool& condition) {
if (condition) {
printf("It's true!");
while_example(condition);
} else {
return 0;
}
}
int main() {
bool x = true;
while_example(x);
return 0;
}
This tutorial isn't geared at the language but moreover the experience and knowlegdge gained from learning x86 Assembly.
Writing our first x86 Assembly code
Anyway let's start (for real this time).
In the IDE website let's type some code.
As I said earlier, x86 Assembly code doesn't have variables.
I did note, however, that there are registers you can use.
These registers reside in the CPU and they are very fast.
The registers we will be using for now are as follows.
eax
ebx
ecx
edx
These registers are just general purposed places to store and manipulate data. They are usually temporary and if you wanted to store something for longer you could use the bss or data section which we will talk about later.
Anyway lets write some code!
mov eax, 8
Write that in the IDE webiste.
Let us discuss the synatx of that code.
The mov statement is used to say to the computer we want to move some data.
This instruction takes two operands, a location and data.
The data can be a location as well.
In this case we are moving the number 8 into the register eax.
So the syntax is:
mov location, data
Comments
To make a comment we use ;.
So we do it like so:
; this is a comment!
Basic Syscall 0x80
Don't be frightened by the title, this is very easy. At the start in may seem complicated but once you see the bigger picture this will be a peice of cake.
If we write this line of code
int 0x80
What happens?
Well, this line is known as an interrupt and it is how we perform system-centered tasks.
This interrupt takes a paramater stored in eax.
So, if we wrote this
mov eax, 7
int 0x80
Then that would work right?
Wrong.
The system interrupt usually needs data from other registers too.
The value stored in eax is telling this specific interrupt handeler what to do.
In the case we want the interrupt to end the program.
The value we need is 1.
So:
mov eax, 1
int 0x80
Would kind of work.
But this syscall takes another argument.
This argument lives in the ebx register.
This argument is the exit code.
0 is the exit code for success.
So, you could think of this as a function definition and calling like this.
int syscall_exit(ebx) {
doSomething();
}
exit(0);
So this code:
mov eax, 1
mov ebx, 0
int 0x80
Would work.
Labels
So, until now your code hasn't worked correctly.
This is due to the fact that we don't have an entry point or an _start symbol.
The _start symbol is needed to start our code.
So, we write this sytax:
<label_name>:
So for our purposes the label_name is _start.
So we write:
_start:
<code>
How about we paste our old code in there?
_start:
mov eax, 1
mov ebx, 0
int 0x80
It still won't work correctly!
We need to expose the label to the linker (a thing in the process of getting our executable.
This makes our label global so the linker can see it.
So we do this syntax.
global <symbol>
Which in our case symbol is _start.
And we do:
global _start
_start:
mov eax, 1
mov ebx, 0
int 0x80
The text and data section.
So far all of our code would be reffered to as text section code.
Let us outline the differences betweeen the text and data section:
Sections:
text
Stores code
Labels
Syscalls
Move statements
data
stores data
defining bytes
memory
pointers
Ok, so to define a section we use the syntax:
section <name>
In our first case the name is .text.
Let us add this to our old code.
section .text
global _start
_start:
mov eax, 1
mov ebx, 0
int 0x80
Everything below the section definition is now part of that section.
Ok, let us test out the .data section.
section .text
global _start
_start:
mov eax, 1
mov ebx, 0
int 0x80
section .data
; now what?
Let us define a byte.
To define a byte we use the syntax <name> db <data>.
So how about we define a byte called x with a value of 99 and try to exit our program with x.
Ok, let us define our byte.
section .text
global _start
_start:
mov eax, 1
mov ebx, 0
int 0x80
section .data
x db 99
Ok, now we exit our program with x.
section .text
global _start
_start:
mov eax, 1
mov ebx, x
int 0x80
section .data
x db 99
Ok, what is going on here?
Well like we said we defined x with x db 99.
We then did the standard exit program with instead of our 0 we put the value stored at the memory location x is reffering to with mov ebx, x.
And there!
Hello, World!
Ok, let us use our knowledge we have gained thus far and do a "Hello, World!" program!
So, firstly we need to talk about int 0x80 again.
Like I said, this is a syscall.
Meaning the operating system handles it.
So, we need to figure out which one allows us to write to the screen.
Firstly, let me teach you about stdout.
stdout is a buffer used by the system to handle output to the terminal.
So what we need to do is write to that buffer.
The syscall we just did with the code of 1 was the sys_exit syscall.
The syscall we need is sys_write.
The code for this syscall is 4 and it takes the arguments of all of the registers you have learned so far.
The functions of these registers in sys_write are as follows.
eax: the code (4)
ebx: file descriptor (in our case 1 for stdout)
ecx: data
edx: data size
Ok let us go over that a bit more.
The file descriptor or ebx is just a little bit of information about the file, we will use this when we do file I/O. In our case, the operating system knows that 1 is stdout.
The data is the ASCII output we want. This could be an ASCII string we define, a integer, or a hexadecimal value. In our case we will be using an ASCII string we define.
The data size is how big our data is in bytes. Since one ASCII character is 1 byte we can just count the amount of characters if we want to, however, we will be using a different way.
Ok, let's start by defining our string and movibg the first data.
section .text
global _start
_start:
mov eax, 4
mov ebx, 1
mov ecx, message
section .data
message db "Hello, World!"
Ok, you might think we are done with that part, but you'd be wrong.
You see we also want a newline on the end.
The problem is assembly doesn't have a \n or a std::endl like some languages.
In assembly we have to reference the ASCII code for newline.
That would be 10, but we are going to reference it in in hexadecimal with 0x0a.
So we are just going to tack this on the end like this:
x86 Assembly Tutorial Part 1 (it’s big I promise...)
Hey.
x86 assembly time.
And yes, I know @JustAWalrus made a tutorial on this already but as much as you may or may not be a fan of him I think we can all admit it wasn't great.
I know, I know. "Cycle squeezing" blah blah blah. I don't write here often but I wanted to get something out. I really don't care about the virtual points. For me the upvotes are about who gets to see the tutorial. If it so deeply offends you that I dare release my writing in multiple parts, just click off.
And without further ado, let us begin
Getting setup
To start go here
And delete the already made code.
Computers at a low level
A computer at a low level is just an ALU communicating with memory.
The ALU is a device that can perform arithmetic and logic operations.
The memory contains data and code.
This code is represented as binary(1's and 0's).
The code may look something like this.
With the first 4 bits(a single 1 or 0) being the operation code or opcode.
The opcode is telling the computer what to do with the rest of the info.
The next 4 bits is the operand.
The operand is data that can be used in the opcode.
Assembly works very similar to this. The first mnemonic is the operand and the following data are the operands.
In assembly we also don't work with variables.
You either work with the stack, registers, pointers, or the heap.
In our journey you will get a newfound understanding of programming. For example, a while loop wont just look like this anymore:
It will look like this:
This tutorial isn't geared at the language but moreover the experience and knowlegdge gained from learning x86 Assembly.
Writing our first x86 Assembly code
Anyway let's start (for real this time).
In the IDE website let's type some code.
As I said earlier, x86 Assembly code doesn't have variables.
I did note, however, that there are registers you can use.
These registers reside in the CPU and they are very fast.
The registers we will be using for now are as follows.
These registers are just general purposed places to store and manipulate data. They are usually temporary and if you wanted to store something for longer you could use the
bss
ordata
section which we will talk about later.Anyway lets write some code!
Write that in the IDE webiste.
Let us discuss the synatx of that code.
The
mov
statement is used to say to the computer we want to move some data.This instruction takes two operands, a location and data.
The data can be a location as well.
In this case we are moving the number 8 into the register eax.
So the syntax is:
Comments
To make a comment we use
;
.So we do it like so:
Basic Syscall 0x80
Don't be frightened by the title, this is very easy. At the start in may seem complicated but once you see the bigger picture this will be a peice of cake.
If we write this line of code
What happens?
Well, this line is known as an interrupt and it is how we perform system-centered tasks.
This interrupt takes a paramater stored in eax.
So, if we wrote this
Then that would work right?
Wrong.
The system interrupt usually needs data from other registers too.
The value stored in eax is telling this specific interrupt handeler what to do.
In the case we want the interrupt to end the program.
The value we need is
1
.So:
Would kind of work.
But this syscall takes another argument.
This argument lives in the
ebx
register.This argument is the exit code.
0
is the exit code for success.So, you could think of this as a function definition and calling like this.
So this code:
Would work.
Labels
So, until now your code hasn't worked correctly.
This is due to the fact that we don't have an entry point or an
_start
symbol.The
_start
symbol is needed to start our code.So, we write this sytax:
<label_name>:
So for our purposes the
label_name
is_start
.So we write:
How about we paste our old code in there?
It still won't work correctly!
We need to expose the label to the linker (a thing in the process of getting our executable.
This makes our label global so the linker can see it.
So we do this syntax.
global <symbol>
Which in our case
symbol
is_start
.And we do:
The text and data section.
So far all of our code would be reffered to as
text section code
.Let us outline the differences betweeen the text and data section:
Sections:
Ok, so to define a section we use the syntax:
section <name>
In our first case the name is
.text
.Let us add this to our old code.
Everything below the section definition is now part of that section.
Ok, let us test out the
.data
section.Let us define a byte.
To define a byte we use the syntax
<name> db <data>
.So how about we define a byte called
x
with a value of99
and try to exit our program withx
.Ok, let us define our byte.
Ok, now we exit our program with
x
.Ok, what is going on here?
Well like we said we defined
x
withx db 99
.We then did the standard exit program with instead of our
0
we put the value stored at the memory locationx
is reffering to withmov ebx, x
.And there!
Hello, World!
Ok, let us use our knowledge we have gained thus far and do a "Hello, World!" program!
So, firstly we need to talk about
int 0x80
again.Like I said, this is a
syscall
.Meaning the operating system handles it.
So, we need to figure out which one allows us to write to the screen.
Firstly, let me teach you about
stdout
.stdout
is a buffer used by the system to handle output to the terminal.So what we need to do is write to that buffer.
The syscall we just did with the code of
1
was thesys_exit
syscall.The syscall we need is
sys_write
.The code for this syscall is
4
and it takes the arguments of all of the registers you have learned so far.The functions of these registers in
sys_write
are as follows.4
)1
forstdout
)Ok let us go over that a bit more.
The
file descriptor
orebx
is just a little bit of information about the file, we will use this when we do file I/O. In our case, the operating system knows that1
isstdout
.The data is the ASCII output we want. This could be an ASCII string we define, a integer, or a hexadecimal value. In our case we will be using an ASCII string we define.
The data size is how big our data is in bytes. Since one ASCII character is 1 byte we can just count the amount of characters if we want to, however, we will be using a different way.
Ok, let's start by defining our string and movibg the first data.
Ok, you might think we are done with that part, but you'd be wrong.
You see we also want a newline on the end.
The problem is assembly doesn't have a
\n
or astd::endl
like some languages.In assembly we have to reference the ASCII code for newline.
That would be 10, but we are going to reference it in in hexadecimal with
0x0a
.So we are just going to tack this on the end like this:
And there we go!
We defined our string, we moved it into
ecx
and we now have a newline!Now let's calculate the length.
We are assembling with NASM behind the scenes and NASM has a macro for doing so.
This macro is:
So in our case it is:
Now we need to make a data pointer
equal
to this.So we use
equ
!The syntax is:
It's very similar to
db
in use.So we can write:
Ok, now we defined our length and put the length inside of
edx
.Now, let us print using
int 0x80
And, we are done!
I'd reccomend just as good practice we exit our code with
sys_exit
.If you want to do that just tack:
On the end of
_start
.Conclusion.
Ok, you have started your journey of x86 Assembly!
There will be more parts coming soon.
If you liked this and want more people to see it share it with your friends and or upvote it.
Hello, great job on the tutorial take my upvote. Repl does have nasm installed in a lot of repls (only polygott (their main docker image) based ones)
Maybe ill learn and program with some x86 assembly @Waku
Keep up the good work