The Basics of x86
sugarfi (267)

Introduction

x86 is one of the most common computer architectures in use today. Many personal computers and devices use either x86 or ARM CPUS. But what does that mean? What is the difference?

The Instruction Set

As you might have heard before, the way code is run at the lowest level is machine code. Machine code is made up of individual bytes known as opcodes. Each opcode is followed by one to two arguments. In essence, an opcode is just a function call. The set of all the opcodes a machine has is called its instruction set. x86 has a very big instruction set, comprised of thousands of opcodes. ARM is what is called a RISC, or reduced instruction set computer. That means its instruction set is smaller than normal: only about 300 instructions. That is why it is used for embedded devices.

Registers, Math, and the Stack

But what do all these opcodes actually do? Some of the most common ones are for manipulating registers and the stack. A register similar to a variable: it just stores a value, and you can manipulate it. There are many registers in x86. First, there are the general purpose registers: eax, ebx, ecx, edx, edi, and esi, as well as r0 through r9. You can store whatever data you want in these registers, and no one will complain. You might wonder what the names mean. The answer is that they indicate the size in bits of the register. If a register is named <x>h or <x>l, it is 8 bits. If it is named <x>x, it is 16 bits. If it is named e<x>x, it is 32 bits. Finally, if it is named r<x>x, it is 64 bits. x86 also has a stack. You have probably heard of a stack: you can push and pop to it, but only pop the top item. The stack is handled by two registers: ebp and esp. The stack on x86 grows downward: when you push an item, the top of the stack moves down one address. The stack is just a region of memory. The registers are used to manipulate it: ebp points to the bottom of the stack, and esp to the top. The stack is how C stores variables: you might have one variable at esp - 8, another at esp - 16, and another at esp. The stack is handled by two instructions: push and pop. The are pretty self-explanatory: push pushes an item, and pop pops it.

OSes and the Boot Process

Well that's all well and good, but what actually happens on boot up? If I turn on my computer, what does it do? How does it load my OS? What even is my OS? All those questions are covered by something called modes and the boot process. We will answer the last question first: an OS is made of a couple of parts: the bootloader and the kernel. The bootloader is what it sounds like: it is booted, and it loads the rest of the OS. The rest of the OS is called the kernel, and it is in charge of pretty much everything: file system, drivers, userspace and everything. On most Linux systems, the bootloader is GRUB or Syslinux. Bootloaders are a complex topic, and I won't go into them here. Now we will get to the actual boot process: how does the computer load the bootloader? When the computer turns on, it loads something called the BIOS, or Basic Input Output System. The BIOS loads 1 sector, or 512 byte group, from the boot disk to the address 0x7c00, and then jumps there. This is why bootloaders are necessary: the BIOS only reads 1 sector, so it is necessary to load more to have a legitimate OS. Most bootloaders do not do this immediately, though: they will load a second stage bootloader, which will load the kernel.

I/O and Interrupts

But how does the BIOS load the bootsector? It doesn't just automatically: it has to use what is called an I/O port or MMIO to interact with the hard disk. An I/O port is just like a network port, but on the computer's hardware: You can read and write to it, and each port has its own particular function. There is one for reading the mouse, another for the hard drive, and another for the graphics card. MMIO, or Memory Mapped Input Output, is another way I/O is done. The way it works is that the device will specify an address for data to be written to it, and read data from that memory address. For example, a VGA card will read data from the address 0xb8000 and write it to the screen. However, most things do not interface with the hardware directly like the BIOS: there are so many types of devices that to cover each one would use up the whole boot sector! Instead, an OS will use what is called a BIOS interrupt. These are done with the int instruction. An interrupt basically tells the computer that the OS wants to do something, and the computer will find the code the BIOS to set up to do that. For example, the BIOS sets up interrupt 0x13 to perform hard disk services. Thus, to interface with the hard drive, I would simply call int 0x13, and the computer would do that I wanted it to for me.

Modes and Memory

Well, BIOS interrupts are great, but they come at a price. When the computer boots up, it is in what is called Real mode or 16-bit mode: You have access to every aspect of the computer with no protections, and you can use BIOS interrupts. The problem is with memory access. The name 16-bit mode comes from the fact that you can only access 16-bit addresses: 0 to 65536. As you might expect, this is a problem: what if an app needs more than 65536 bytes of RAM? The solution is called Protected mode pr 32-bit mode. In Protected mode, you have access to 32-bit addresses, letting you read and write 4 GiB of RAM. You can also set privileges on it to only allow certain processes to access it. The price of this is that you cannot use BIOS interrupts anymore. Many older operating systems like DOS run in Real mode, but modern ones run in Protected mode or Long mode, which allows you to access 64-bit addresses.

Segmentation, Paging, and the GDT

Why is it that Real mode OSes can only use 16-bit addresses? The reason is that they use something called segmentation to access memory. Each area of memory is divided into a segment. You then access offsets within the segment. For example, you might have the segment 0x1000 and the offset 0x1234, giving you the address 0x1000:0x1234. To convert this to a physical address, we shift the segment left 4 bits and add the offset to that. By applying this, we get (0x1000 << 4) + 0x1234 = 0x10000 + 0x1234 = 0x11234. The computer handles segments using segment registers: cs, ds, ss, es, fs, and gs. These are, respectively: the code segment, where the program runs, the data segment, where data is stored, the stack segment, for the stack, the extended segment, for user use, and two useless segments. Protected mode does things differently. Because of the size of a paging physical address, we can only access addresses that will fit in a segment:offset address. When you enter protected mode, you must set up something called a GDT, or Global Descriptor Table. This is where you define how you want to access memory: for example, you could set up your OS to only use addresses 0x10000 to 0x90000. Once in protected mode, you do not use segmentation anymore: you must set up something called paging. In paging, you use what is called virtual memory. This allows you to have each process think that it has access to all 4 GiB of memory and that it runs at 0x0000, even if it does not. In paging, virtual and physical memory is divided into pages. Each page in virtual memory is mapped to a page in physical memory. A list of all the pages and their mappings is stored in a page table, which is in turn stored in a page directory where the computer can access it.

A Final Note: Emulators

If you want to write you own OS for x86, you will want to learn more about assembly language, the human readable form of machine code, and C. Then, you will need to write a bootloader and kernel, or set up a kernel to boot with GRUB or Syslinux. Finally, you will have a bootable image, which is basically just a file containing you OS as machine code. What do you do know? You need to know if you OS works! You could write the image to a flash drive, reboot your computer, and boot from the flash drive. But this is messy and wastes time. Instead, most OS developers use an emulator for testing. An emulator is simply a virtual machine that runs you OS: it emulates the x86 instruction set, so your OS thinks it is running on real hardware, when in reality it is just running as an app. Two good emulators are QEMU and VirtualBox. VirtualBox is geared more towards those who want to run other OSes on their machine, say run Linux without installing it. It is easiest to use QEMU to test your OS in development.

The End

This concludes my tutorial on x86. If you liked it, be sure to check back later, as I plan to write other tutorials on Python and perhaps even writing a basic OS. Thanks for reading!

You are viewing a single comment. View All
Highwayman (1038)

@V3rmillionNet
F. Ok now someone really does have to make a replOS.

sugarfi (267)

@Highwayman I tried, but QEMU doesn't run on repl.it

Highwayman (1038)

@sugarfi I mean, repl.it doesn’t necessarily have to be the ide you develop it on, it can just be the site upon which you distribute the code. It’d just be geared towards repl.it developers I guess. But I guess we’ll just have to go for a replShell. 😞 that’s too bad.

sugarfi (267)

@Highwayman I have tried writing an OS before, but I always gave up or got bored before it went anywhere. I am working on another one now, though

Highwayman (1038)

@sugarfi tbh I’d probably get bored of it even if I had the ability lol, but cool. What do you think it’ll be like?(your os)

sugarfi (267)

@Highwayman I am aiming for a DOS or Unix like thing, with a basic command line and app support

sugarfi (267)

@Highwayman you could download and run apps

Highwayman (1038)

@sugarfi oh lol my brain how did I not get that.

V3rmillionNet (53)

@Highwayman damn, i hate the fact that internet is able of teaching us anything

Anyways time to code a os

Highwayman (1038)

@sugarfi
-..-
-..
I was gonna do that one earlier, but I wasn’t sure how to make it clear that it was base 8 lol.

Highwayman (1038)

@sugarfi what’s that using? Never seen that before.

Highwayman (1038)

@sugarfi oh lol, that’s actually the one I was gonna do. hm... this is hard finding new encodings...hm.. this might take a bit lol I think I’ve lost 😂

Highwayman (1038)

@sugarfi O.o
I searched for an hr. Wat. Ok I did lose lol.

sugarfi (267)

@Highwayman you could still use base 32, or better yet base 69....

Highwayman (1038)

@sugarfi lol 69. I saw a post on SO the other day now that I think about it that was just about reducing the amount of space raw binary would take up when encoded. The problem was everything else was horribly slow, so I think eventually they just went with base64.
I think if I can figure out base 32....

Highwayman (1038)

@sugarfi nope. I just don’t get how they work 😓

sugarfi (267)

@Highwayman just use python int(base=32)

sugarfi (267)

@LiamDonohue cool, i would love to see what you have so far!

Highwayman (1038)

@LiamDonohue this seems to be a shell more than an os?

LiamDonohue (206)

yeah, that's why I said kinda because it's kinda a weird breed lol @Highwayman

sugarfi (267)

@Highwayman @LiamDonohue yeah, this is not a bootable program, it is a console app

LiamDonohue (206)

it's kinda an example of what I might make for an OS @sugarfi

LiamDonohue (206)

@highwayman @sugarfi do yall want to get on a multiplayer and start one? lol also try out https://repl-mail.mreconomical.repl.co/mail

Highwayman (1038)

@LiamDonohue I’m not good at collab, strange as that sounds.

LiamDonohue (206)

do you have a repl mail accnt? im curious how many people do @Highwayman

Highwayman (1038)

@LiamDonohue I would, but repl mail’s blocked for me :(

LiamDonohue (206)

? how is repl.co blocked? @Highwayman

Highwayman (1038)

@LiamDonohue everything is blocked for me except for repl.it. Maybe if I can figure out if they also host the servers under the repl.it domain....

Highwayman (1038)

@Highwayman well actually not everything but most tgings

LiamDonohue (206)

ahh ok pretty much the same with me I have to request unblocks from the administrator @Highwayman

Highwayman (1038)

@LiamDonohue same. Except my admin is my mother, so that never gets anywhere lol.

LiamDonohue (206)

lol mine is my father so it's a bit different @Highwayman

Highwayman (1038)

@LiamDonohue XS I hate whitelist restrictions :(

LiamDonohue (206)

I had an idea for a programming language: uwuscript @Highwayman

Highwayman (1038)

@LiamDonohue UWU that’d be fun, all owos and uwus and idk something else lol

LiamDonohue (206)

yeah ima start working on it lol @Highwayman

Highwayman (1038)

@LiamDonohue I’ve just been trying to make head or tail of networking C++ rn, so I’m just too dead inside to make up good ideas like that lol. How’s it coming along?

LiamDonohue (206)

well im working on something called THAIL (Technical High-Level Abstract Language) but i just had the uwuscript idea @Highwayman

LiamDonohue (206)

here's an example:

UwU 1.0
hewwo "Hello world!"

@Highwayman
(UwUScript)

Highwayman (1038)

@LiamDonohue that’s actually nice lol. What’s THAIL? it sounds super cool.

sugarfi (267)

@Highwayman for unblocking websites, sometimes you can use https://replbox.repl.it/data/web_hosting_1/MrEconomical/repl-mail or something similar, but it looks like it doesn't load stylesheets.

Highwayman (1038)

@sugarfi it does load style sheets, but it doesn’t host servers. I think. I’m still trying to figure out the servers.. 🤷‍♂️

sugarfi (267)

@Highwayman what do you mean? that is how repl.it does their hosting as far as i know.

Highwayman (1038)

@sugarfi well... idk, let me try it again. Maybe it’s under a different web_hosting folder or something. Idk. Let me see.

Highwayman (1038)

@sugarfi wow. Yeah not stylesheets and I remember why. You’d have to figure out how to resolve the link for replbox.repl.it too. And everything else doesn’t work too because of all the missing js and links and stuff. Yeesh. It’s terrifying.

LiamDonohue (206)

a programming langauge im working on, i just have to finish the documentation then i can start coding it @Highwayman

Highwayman (1038)

@LiamDonohue oh! Well it sounds super cool :3

LiamDonohue (206)

ikr? if you would like to help, just tell me @Highwayman

Highwayman (1038)

@LiamDonohue :/ I’ve never done that kinda thing before and I’d just end up dragging you down since I’m so bad at collab. thank you for the offer though :)

sugarfi (267)

@LiamDonohue you're making a language too? what kind?

LiamDonohue (206)

a programming language called THAIL (stands for Technical High-Level Abstract Intermediate Language) @sugarfi

LiamDonohue (206)

yeah, the idea randomly came to me in class the other day @sugarfi

sugarfi (267)

@LiamDonohue what does the syntax looks like?

LiamDonohue (206)

so its like a weird combination between Visual Basic, Python, and JavaScript
here's an hello world program

print = "hello".

the period at the end of the statement is required or a syntax error would be thrown
@sugarfi