Hacker News new | past | comments | ask | show | jobs | submit login
Lab 1: Booting a PC (csail.mit.edu)
149 points by luu on Jan 7, 2015 | hide | past | favorite | 29 comments



This was one of my favorite classes, so much that a few of us were crazy enough to offer a January term version of it last year. The material is the same, but the website formatting is for the labs a bit different, and may be more readable to some: https://sipb.mit.edu/iap/6.828/. It's doable to finish all of the labs in an intensive month, and a great experience. I do recommend going through the exercises.


hey thanks for posting a link to the archive you made. your efforts to archive and display the material are much appreciated.


We used this set of labs as the basis for the master's OS course at UW. They're fantastic and fairly straightforward, providing scaffolding and then having you implement basic versions of interesting OS facilities like context switching, exception handling, interrupts, scheduling, RPC. Really rewarding (and sometimes quite painful - debugging an errant OS can be quite a bitch).


I took the class (6.828) last semester at MIT, and the labs do a great job of handling a lot of the annoying x86 stuff so that you can focus on the more interesting parts of writing an OS.

I know this because for the final project my partner and I ported the OS to 64bit. Although the number of changes was not too large, we spent hours debugging low level x86_64 stuff. Although we learned a lot, I think it is a good idea the labs provide the boilerplate.


Agreed on the fantastic-ness -- the first 4-5 in the sequence in particular give you a very nice flavor of working in kernel-space, and they are doable. I taught a senior ugrad OS course and experimented with using this as the project sequence.

Some folks excelled (though they put in 15+ hours per week on average, which is quite high), some cratered and I needed a backup plan for them to make progress.


This is only tangentally related to the article, which is interesting in its own right, if a bit over my head presently. I noticed that they were using tools like git and make to manage distribution of the lab assignment and submission of the lab work. This is cool.

When I was in college, lab assignments were typically printed or photocopied sheafs of paper, and handed out. To submit one's lab work one printed out the results, and handed them in, or called the instructor over to watch one's demo, etc. I would have loved to have been able to use existing tools I already knew how to use to manage all that automatically.


> I would have loved to have been able to use existing tools I already knew how to use to manage all that automatically.

Even better for students that don't have a lot of coding experience. Rather than slogging through some terrible Blackboard-esque environment that they'll never see again to submit assignments, they're actually becoming comfortable using versioning and development tools that are the current industry standard. Very useful for everyone.


There are a lot of things here that stand out to me, as a recent grad of just a regular state school.

Things like "Using an API Key", and "Using git" cause students to HAVE to go above and beyond the assignment Because they will undoubtedly run into a problem with some of these extra steps along the way.

I think this type of learning can be more beneficial than the assignment itself. It's real problem solving using the computers.

That's just cool to see.


Back before I went to university, I spent a lot of time taking programming and systems classes at the local junior college while in high school. That coupled with the tinkering I did on my own, as part of the local BBS seen, and helping others with computers taught me a lot more about the practical aspects around computers at the time -- assembly, TSRs, device drivers, different compilers, different operating systems, etc. -- than was taught at the school I went to.

If a course with this level of detail had been available to me, I would have taken it in a heart beat. Mmm...boot loaders.


Agreed, I am currently attending a state school and very few of the professors emphasize the importance of self study. Learning to learn is more important than any piece of information college has to offer. The fact that MIT puts part of the learning burden on the student is another reason that the students go on to do great things. The professors are in part readying them for the real world.


Part of the explicit ethos of MIT undergraduate education is that some significant fraction of what you're learning will be obsolete in due course, or irrelevant to what you end up doing, but at the very least the school is teaching you how to learn stuff quickly.


IME, most CS departments will have tutors or training sessions available for things like learning git or POSIX that there is no time for during class time.


In my experience, people who hold this attitude would look down their noses at a basketball coach who asked for a hundred and ten percent from his players. But this assignment is no different: if you're required to go above and beyond, you aren't really going above and beyond anything in particular. You're just doing what's required to finish an assignment that happens to be specified imprecisely.


While BIOS-booting does seem a bit dated, this still looks like an interesting lab to go through just to understand the bits and pieces of how PCs used to boot before UEFI.

On the flip side, I can also understand why you wouldn't want to start an Operating-system engineering course with UEFI.

Unlike BIOS (a set of historically accumulated hacks, which works in a given way because they just do), UEFI is quite a big thing, a big spec and a OS onto itself. It would probably involve quite a bit of indirection.

On the flip side (again), experimenting with the boot process on (real) UEFI machines would probably be much simpler than trying to program boot PROMs and similar stuff they resort to when experimenting with QEMU.


For practical operating systems development, you want to use a bootloader on top of BIOS or UEFI. In particular, the Multiboot[0] protocol is very useful, because it is supported out-of-the-box by bootloaders (like GRUB) and there's built-in support for multiboot in emulators like QEMU and Bochs (qemu -kernel mykernel.elf).

It is rather easy to build a multiboot-capable ELF file that can be easily booted on real iron as well as emulators.

This will directly set up your computer in protected (32 bit) mode (NOTE: working in 64 bit "long" mode is a bit more difficult, for educational projects, it's best to stick with 32 bit mode). The protocol also has specs to set up a video mode and provide you with a pointer to the framebuffer (otherwise you'd need to fall back to 16 bit real mode, and use BIOS interrupts for setting up a video mode) and provide some information about the machine in a structure provided to the kernel on boot.

Booting directly with UEFI might be more and more viable option as time goes by, but I think that multiboot is still more viable (things might have changed, I haven't done OS development in a few years).

[0] https://www.gnu.org/software/grub/manual/multiboot/multiboot...


We had these labs as a base for our course on OS development. Original tasks seemed to be very good - even if you did something wrong, the tests would probably catch it. But those who made our course decided to introduce some changes, which led to the following proportions of spending time: about 0.5-1 hours for the task itself and continiously growing number of hours for debugging, first mistakes in their code, then the bugs we introduced ourselves. My first hacking experience, indeed.


A fun thing to do is write a bootloader for Raspberry Pi. It doesn't have some of the cruft of x86 while still being a real bootloader.


But the VideoCore IV is kind of a weird vector processor, there's still a lot of official documentation we don't have, there are those ring buffers, you have to wake the—

Oh, you meant the ARM? Oh, my, no, the Pi doesn't boot from that. The ROM loader starts the SDCard and loads BOOTCODE.BIN to the VC4 with cache-in-RAM mode; that's its equivalent of the bootblock. The ARMv6 isn't even turned on until the next chain later (used to be two, from memory?).

So, you know, if you fancy a much more fun, more advanced, challenge, it's right there waiting for you, under the crust. Have fun. Let me know if you find a diagonal vector addressing mode or a good way of shuffling words! <grin>

(PS: There is actually someone at Broadcom working on open-source firmware for the Raspberry Pi, or was last I checked. They're going to have to rewrite it: the ThreadX RTOS they used is commercial. But he has firmware documentation, at least. You have… some of it.)


A problem with all ARM system on chips is that they don't have any kind of standard like the PC is for x86. There's some work going on on that field but at the moment, each ARM SoC has their own conventions for booting. It is typical that there is a smaller ARM bootstrap core on the SoC that does the early boot process and kicks up the actual (bigger) ARM core(s) and then hands over control to a bootloader (like u-boot) or boots the OS using a device-specific bootstrap code.

In other words: it would not be fun to do this using a RasPi (depending on your definition of "fun" of course). You would be spending lots of time with scarce documentation, dealing with practically no debugging aids and the only way to know if your code is working is whether a LED lights up and no way to debug failures. If you had a development board, perhaps you'd have JTAG or other debugging means available but I'm not sure if you can find this on a RasPi. You'd end up with very little transferable knowledge from this project.

Of course, working with scarce documentation in a harsh early boot environment is a valuable skill in itself, but not the kind of stuff I'd voluntarily get into. This stuff is hard enough to do on a well documented, widely available platform like the x86 PC.


The ARM core on the Raspberry Pi is accessible via JTAG. It requires a board with an FT4232H or FT2232H chip. See https://github.com/dwelch67/raspberrypi/tree/master/armjtag for details.


Ok, nice to know that it has a JTAG even on the production device.

However, if that plugs into the "big" ARM core, that's of no use with the early boot process (which uses the VideoCore, according to the post below).


This reminds me of the first lab assignment of my OS class. Ah, here it is (but don't trust this link: the professor takes it down and puts it back up again with slight changes every term): http://eecs.wsu.edu/~cs460/LAB1.html

Interesting notes about this class:

1. This is the second of two one-semester systems programming classes. In the first semester, you learn how to implement ext2. The bootloader assignment is mostly a minimal refresher of the same subject, so you're expected to have remembered from last semester how to traverse the filesystem.

2. You have to write the whole boot loader yourself, aside from some assembly code you've been provided. It runs in 16-bit mode, so you can't use gcc, at least not easily. You use bcc; Bruce's C compiler, which doesn't really read ANSI C.

2a. This actually continues throughout the entire class, during which you will eventually build an entire kernel, in 16 bit x86, using Bruce's C compiler. I don't know who Bruce is, but the professor apparently thought that 32-bit mode did too much of the work for you and wanted us to understand how to do some of the stuff (i.e. switching between kernel and user mode, I think?) ourselves.

2b. Bruce's C compiler lacks a lot of interesting features of more modern C compilers, like function signatures. In Bruce's C compiler, much as in early dialects of C, you can call any function you want and pass whatever arguments you want; the only thing the function does is read it off the stack. Downsides: whatever paltry type safety you get from C is gone; you can pass a long to a function that expects a short and it'll just grab enough bits off the stack to make a short. Upsides: variadic functions, like printf, are marginally simpler to implement; when they released some of the very earliest C/Unix code, I found that my printf had a very close resemblance to the original.

3. You have to fit the whole bootloader in 1 KB, which is not the kind of programming challenge I had yet faced.

4. Since the entire class is taught using ancient technology, there's obviously no emphasis on things like git or API keys. I personally took this class as an opportunity to teach myself git in order to maintain my own sanity.

The professor who teaches these classes is a bit of a legend among CS alumni at my school, partially for these lab assignments and partially for his lectures, which involve extended metaphors, and in which he doesn't erase anything he writes on the whiteboard, instead diagramming or writing over whatever he diagrammed or wrote before.


3. You have to fit the whole bootloader in 1 KB, which is not the kind of programming challenge I had yet faced.

If you were doing it in Asm, 1KB is a lot of code. The regular DOS bootsector is 512 bytes, of which only ~400 are available for actual code. It doesn't take all that much code to read in a few sectors from disk into memory and jump to it, which is all that a bootloader does.


I tweaked the style a bit for the site for "better" readability (though it's subjective): https://userstyles.org/styles/109116/mit-operating-system-en...


Also found a nice ebook "xv6: a simple Unix-like teaching OS" elsewhere on that site.

[1]: http://pdos.csail.mit.edu/6.828/2014/xv6/book-rev8.pdf


Nice that one does not have to deal manually with that on each boot :)

The stuff that just happens to get stuff working is amazing.


Oh hey. I had a fun time doing this Operating Systems Engineering course at Technion. Great labs!


> NASM uses the so-called Intel syntax while GNU uses the AT&T syntax

Which is a great reason not to ever use GNU assembler for anything.

> we will be using the GNU assembler.

AAAAAAGGGGGHHHHHHHHH


While I think most people/projects use at&t syntax with gas, it actually seems to support intel syntax pretty well, see eg:

https://github.com/0xAX/asm/pull/2/files

I used to think it was "nasm or nothing" - but gas+intel seems rather pleasant too.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: