Hacker News new | past | comments | ask | show | jobs | submit login
Robot Game: Comparing 6502 C, Assembly, and Forth (calc6502.com)
94 points by druzyek on July 12, 2020 | hide | past | favorite | 35 comments



The 6502 is still doing hundreds of millions of units a year, and is still a terrific introductory processor that is comparatively easy to wrap your head around, especially if you are interested in board bring up from scratch. (Compare the requirements to boot a 68000 to a 6502 into a nop loop, for example.)

The 6502 is far from dead.


Indeed, it is highly useful for learning projects. See Ben Eater's "Building a 6502 Computer" series[1], and Quinn Dunki's "Veronica"[2], each a from-the-ground-up creation of a system around the 6502.

[1] https://eater.net/6502

[2] https://blondihacks.com/veronica-all-articles/


Glad to hear it! I first learned assembly for 6502 on the Apple ][, and found it indeed easy to wrap my head around. Great machine, great architecture, TED II Editor/assembler was amazing...


The great thing about the Apple ][ was that it was largely the work of Woz and thus it was very much possible for a single person to fully understand the whole thing. Even the addition of AppleSoft BASIC and DOS didn't keep it from being comprehensible.


There are microcontrollers that you can program on an easy-to-use USB-based development board, that lets you debug C code with a breakpoint debugger. Your program is stored in flash memory on the chip: you can just unplug it, and plug it into your project board.

If you're going to use a microprocessor with external peripherals for everything, I'd just bite the bullet and go for a MC68000.

I have that Microprocessor Systems Design somewhere (Alan Clements), from many years ago. I remember the MC68000 examples circuits and code in it are not too bad.

Here is a possible consideration: the MC68000 has DTACK line with wait states for accommodating slower peripherals. The 6502 is synchronous: it expects peripherals to respond according to its clock, on time.


Crazy. Less than a U.S. dollar each on AliExpress.


This nicely provides empircal evidence on the slowdown of using Forth, which seems to be around 10 to 20 times slower than optimized assembly. I was initially surprised by how poorly Forth performed on all counts of speed, memory usage and development time. For something as complex as a game the lack of a type system and postfix nature make Forth quite unsuitable.

When I use Forth, it's often for high-level applications[0] with the performance critical words written in assembly. So this keeps the complexity relatively low as opposed to writing the whole thing in Forth.

[0] https://github.com/siraben/zkeme80/blob/master/src/bootstrap...


If you program Forth like it's C, and then use a non-optimizing compiler, you're going to have a bad time (e.g. 10/20x slowdowns).

I've written Forth compilers that generated output that was ~25% slower than optimized assembly, without putting that much work into it. There are companies that will sell you a much better compiler [0], at prices even approachable to hobbyists.

[0] - https://www.mpeforth.com/software/pc-systems/vfx-forth-for-l...


FWIW: The URL is being truncated right at the "...for-linux" part. I was mildly surprised.

This particular distribution also ships with GTK+.

That being said, a simple "3 2 1 ?" (I was unsure what word would dump the stack) SIGSEGVs quite consistently.

It's also 32-bit, and the backtrace handler produces "XXXX:XXXX" style addresses (?).


You’re dereferencing an invalid pointer with a variable lookup; try “3 2 1 .s” ... there’re some good ANS Forth gems here [0].

[0] - https://1scyem2bunjw1ghzsf1cjwwn-wpengine.netdna-ssl.com/wp-...


I think the writer use case takes no benefit from Forth. He already has some code and is trying to translate and optimize it. In this case, none of the advantages of Forth matter because he is not doing the development interactively. He was also tied to an existing code architecture that is alien to Forth. A Forth programmer starting from scratch would certainly design things in a way that makes more sense to Forth. Instead, his design clearly benefits a C implementation and goes against the way Forth code is structured.


OP here. This is a really interesting point. My strategy was to keep the highest level functions/words like DrawTile or DrawMenu while writing the underlying Forth and subwords from scratch focusing on performance. Do you have any suggestions that would make "more sense to Forth" as you say?


Not the GP, but I'd recommend the latest edition of _Thinking Forth_[0]; there's a PDF but the soft-cover is worth picking up. It helped change the way I think about Forth (hah), and really, programming in general.

[0] - http://thinking-forth.sourceforge.net/


Although it does not directly help, it gives the general idea:

http://www.ultratechnology.com/levels.htm


Yes, FORTH is an elegant alternative to BASIC on very small systems, not something you should expect high performance from.


To mirror asguy's comment, this only tells us about the performance of this particular FORTH engine, it doesn't tell us about the performance inherit to FORTH (which of course doesn't exist, it's down to the engine).

Whether anyone has made a sophisticated optimising FORTH for 6502, I don't know.


>this only tells us about the performance of this particular FORTH engine

Yes, exactly. Whatever its other qualities may be, I suspect this particular Forth has overlooked some pretty obvious low hanging fruit, performance-wise. In this[1] post we learn that LOOP puts a 1 on the stack then falls into +LOOP. Although there's elegance (and a memory saving) to that approach, I'm startled that they didn't provide a dedicated definition for LOOP instead. AIUI, implementing LOOP as an instance of +LOOP substantially and needlessly increases the complexity of what gets executed. Yes, I know premature optimization should be viewed with suspicion, but if profiling were performed it's hard to believe LOOP wouldn't be a hot spot! So, I constructively suggest that in this respect at least (and perhaps there are others) this Forth engine could benefit from some tuning up.

[1] http://forum.6502.org/viewtopic.php?p=76849#p76849


When I used to write games in 6502 assembly, I'd use a more native style rather than modeling the code after a high-level language. For example, I'd maintain global settings instead of passing lots of parameters, often having these live across many function calls. I wouldn't pass more parameters than fit in A/X/Y. The main loop was sometimes a big run of inline code instead of having many routines that only get called once. And so on.

Nice project! Looks like it was fun to work on.


I wonder if the author is familiar with the PLASMA [1] language and bytecode VM for 6502-based Apple computers. That might be another interesting option to add to the comparison.

[1]: https://github.com/dschmenk/PLASMA


PLASMA is a really neat project! I considered it and a few other systems before settling on C, assembly and Forth. Ultimately I left it out because I was focusing on performance not size, so a byte-code VM would not be a good fit.


Interesting that OP ignored ca65 and went to NASM for the second assemble option especially since they were already using cc65.

Nice use of High Level Assembly with the macros!


Unfortunately, it seems that for all practical purposes, and as much as we love it, 6502 in all its (still existing) incarnations is all but dead.Even MIPS has been dying, albeit slowly and painfully. The unpleasant truth is that as we now all know that ARM is currently the only sane choice for a hobbyist project (see, for example, Color Maximite 2), and we can only hope that RISC-V will someday replace it in the hands of CPU aficionados, but in the times when for a few pennies we can have an 8-pin ARM microcontroller, this is a no-brainer. (ESP32, being quite popular, due to its highly integrated nature and the focus on IoT is a somewhat different proposition.)


"What is sane for a hobbyist project" depends on the project and the wills of the designer.

I've built a couple of 6502 based systems now and added VGA output and floppy support which probably would've been easier with an ESP32 or something but my goal was to learn how to do this the hard way.


I suppose it depends on the kind of hobbyist. If you want to learn how to code for the 6502, for nostalgia's sake -- a powerful motivator for many hobbyists -- then learning to code for ARM will not scratch that itch.

You can always emulate the 6502, it doesn't have to be the real hardware.


A big part of the appeal of the 6502 is with the retro community. If your goal is write an NES game (and there’s quite an active community doing just that) then you have no other choice than to learn the 6502.

That’s fine though. The 6502 is a classic CISC processor with a fairly high level instruction set. This makes it fun for humans to program. I think where people get into trouble is when they try to bring up these complicated tool chains and high level languages. That’s a mistake. Program the 6502 directly in assembly with a nice macro assembler. It’s the best way to go.


Is the mult5 example an actual piece of code of your program?

Any multiplication where one multiplicand is a constant should be much faster with shifts and adds than with a loop. mult5, e.g. is: x+(x<<2)


Right, I do this in the assembly versions everywhere I multiply by a constant. mult5 is just an example.


I don't know Forth on the 6502 but I think the statement that STC is the fastest form of Forth is not always true. But it will lead to larger programs.


It's kind of a funny coincidence that I also wrote a very similar looking game which I also called "Robot Game" at some point, and I also rewrote it at multiple points in time in different languages:

* Starting in Visual Basic 3 in 1999 (http://www.az2000.de/projects/robot/),

* then later in Object Pascal / Lazarus in 2006 (http://www.az2000.de/projects/robot2/),

* and recently in Python in 2017 (https://github.com/albertz/PyOverheadGame).


> This brings us to one of the main shortcomings of Forth. If you need to access more than three local variables at once (in the body of a loop for example) there is just no convenient way to do so. [...] Consider the variables used in this C function that draws 1 bit-per-pixel images with optional rounded corners: > int DrawTile1bpp

Looking at the C version, the argument "tile" is used once in the function, to get the pointer to the tile. The pointer could be passed directly: one less local. x and y are passed only to calculate an address, this address could be passed directly - there are only three calls to this function, it might be worth the DRY exception.

t0 seems to be always 0, except when it is undefined.

The t_height, t_with are just offsets in the tile structure. This locals can go away too - probably eating the cost of fetching them each time is more effective than the cost of all the stack juggling done to avoid it. This reminds me a word from a Chess great master: "don't waste time trying to save one".

trans_row only exists to set skip_pixel, it seems one of them can go away. The logic seems to be "unless trans_row and something, do something". The C version might actually be more verbose than necessary.

Finally, the edge_style complicates a lot the logic. Using one function to do different things because it is so simple to "just add another parameter" is typical in many HLL, and often result in awful spaghetti code.

What a Forth programmer would do is to write one function for each edge style and see what the have in common. More often than not, they share a lot of subwords, so that having one function for each case is not more expensive than one function for all cases.

I suggest doing that with the C version and see what happens.

> Without parenthesis or commas, there is no way to tell just by looking at the code which of those are functions, which ones are variables, which of them are inputs to the others, or what any of them return. Forth fans will say that that information is available in the stack comment for the word's declaration, but that assumes that the programmer bothered to create one. Even if they did, you have to search in several places in the file to figure out what is instantly obvious if the same code were written in a language like C or Python. It's easy to see why people criticize Forth for being "write only." If you translated the line above into C, it could be doing any of the following:

Actually even in C some follow certain naming conventions - like #defines being all caps, member things being prefixed by m_, etc. And you can do that in Forth too. Once again, "write only" has more to do with the author than wit the language.


Hi astrobe_, interesting points! I disagree with you on a few things:

> Looking at the C version, the argument "tile" is used once in the function, to get the pointer to the tile. The pointer could be passed directly: one less local.

The argument "tile" is an 8-bit index, which is faster to pass than the 16-bit pointer to the tile. More importantly, that index is used for things other than looking up the tile pointer, such as finding color pairs to recolor the tile. In other words, you could pass the 16-bit tile pointer directly, but you'd still need a way to correlate that pointer with its matching color data, and that system would be slower than using the same 8-bit index for both.

> x and y are passed only to calculate an address, this address could be passed directly

That would be more efficient and save a handful of cycles when the function is called. This does not answer the original criticism, however. As I mentioned on the page, x and y go away after the pointer is calculated, so I'm not counting those as variables that need to be available in the loop body. You still have over a dozen variables that you can't manage efficiently in Forth.

> t0 seems to be always 0, except when it is undefined.

Good point!

> The t_height, t_with are just offsets in the tile structure. This locals can go away too

I don't think I see your point. If you're going for performance, doing one array access and saving the value saves a lot of cycles compared to indexing into the array each iteration, which eats a lot of cycles.

> trans_row only exists to set skip_pixel, it seems one of them can go away. The logic seems to be "unless trans_row and something, do something". The C version might actually be more verbose than necessary.

You could get rid of one of them at the expense of readability. Or maybe there is a good name that would indicate both purposes.

> Finally, the edge_style complicates a lot the logic. Using one function to do different things because it is so simple to "just add another parameter" is typical in many HLL, and often result in awful spaghetti code.

I don't think this function is awful spaghetti code. Yes, the logic is complicated. I mention on the page that I encoded the data for some tiles as 1 bit per pixel to save room and decided to make rounded transparent corners in code depending on a flag stored for each tile. This function is a compromise to save memory. Otherwise, you could just encode the tile like any other and have three colors including transparency but take up 8 times more memory.

> Actually even in C some follow certain naming conventions - like #defines being all caps, member things being prefixed by m_, etc. And you can do that in Forth too. Once again, "write only" has more to do with the author than wit the language.

I really disagree on this one! All caps and so on for constants is nice but still does not tell you how many values, if any, a word pops and how many it leaves behind. That alone earns the "write only" label. For example: CONST_A m_foo CONST_B m_bar. What does m_foo return and what arguments does m_bar take?

Thanks for the comments!


To be clear, I don't think this function is spaghetti code.

I believe you are victim of premature optimization. You knew Forth would be slow because interpreted, so you feared for performance. Fear often makes you do the wrong things.

The nice thing about Forth is that speed is often correlated to the number of words in a definition, so you can start writing for source/memory compactness all while partitioning the program into meaningful words as much as you can. Then assess the situation on the performance side.

> I really disagree on this one! All caps and so on for constants is nice but still does not tell you how many values, if any, a word pops and how many it leaves behind

Just like in Lua or Go you cannot tell how many values a function returns, and in most dynamically typed languages, what are the types of the input arguments. Furthermore the name alone cannot tell you about the behavior for corner cases either, like what happens if you pass a null pointer as a second argument to a string concatenation function: no-op or crash? You cannot tell until you check out the docs or read the source of the function.

Chuck Moore even claims that "stack pictures" (comments describing the stack effects of a word) are unnecessary; the effects should be obvious from the definition or if it needs more elaborate commentary, it should be documented somewhere else.

Documentation is something people don't take seriously. When you see that the whole documentation of a project consists in Doxygen files this is usually a bad sign. It might be pretty and nice at first glance, but there is more to documentation than documenting the interface of functions and dropping a few lines of introduction on top of that. Documentation should tell a story. And for this story to be well told, you cannot follow the layout of the code.

So to answer your question, what a word does should be obvious from the name or from its definition. Someone said naming things is one of the two hard problems in CS. This is where Forth shows you the real problems and demands thoughtful solutions. 47 characters-long names are certainly not the solution. Documented naming conventions and making sure a word does not do too many things are.


> I believe you are victim of premature optimization. You knew Forth would be slow because interpreted, so you feared for performance. Fear often makes you do the wrong things.

As I explain on the page, the Forth I'm using is not interpreted. It generates STC, and if the body of the word is smaller than a particular size which you set, it will inline the code. Someone measured the dispatch overhead for fetching the next word in FIG-Forth for the 6502 at over 80 cycles. STC only needs 12 for a JSR/RTS pair and potentially 0 if the word is inlined. In any case, if I'm trying to make each version as fast as I know how, it wouldn't matter if I knew that it was interpreted or not. On the other hand, I'm happy to look at any place in the source you find where it seems fear made me do the wrong thing.

> The nice thing about Forth is that speed is often correlated to the number of words in a definition, so you can start writing for source/memory compactness all while partitioning the program into meaningful words as much as you can. Then assess the situation on the performance side.

Speed is correlated with number of words in that it goes down when you do a lot of factoring. Admittedly this depends on the architecture and penalty for subroutine calls. I was all set up to factor all my words down to a line or two after reading Starting Forth but switched to longer but much faster words after starting this project.

> Just like in Lua or Go you cannot tell how many values a function returns, and in most dynamically typed languages, what are the types of the input arguments. Furthermore the name alone cannot tell you about the behavior for corner cases either, like what happens if you pass a null pointer as a second argument to a string concatenation function: no-op or crash? You cannot tell until you check out the docs or read the source of the function.

I've never used Lua or Go so I can't comment on that. This does not answer the original criticism. Sure, there are pieces of information in every language you don't get until you look at the documentation. The point still stands that you can tell what arguments a C function takes and what it's returning at a glance while you can't in Forth.

> Documentation is something people don't take seriously. When you see that the whole documentation of a project consists in Doxygen files this is usually a bad sign. It might be pretty and nice at first glance, but there is more to documentation than documenting the interface of functions and dropping a few lines of introduction on top of that. Documentation should tell a story. And for this story to be well told, you cannot follow the layout of the code.

Good point

> So to answer your question, what a word does should be obvious from the name or from its definition. Someone said naming things is one of the two hard problems in CS. This is where Forth shows you the real problems and demands thoughtful solutions. 47 characters-long names are certainly not the solution. Documented naming conventions and making sure a word does not do too many things are.

This is something different than what I'm pointing out. Yes, good naming will tell you what it's doing, but it doesn't tell you HOW it's doing it. Even if the word is kept short and doesn't do too much, the programmer has a lot of freedom in how they choose to use the stack, so even in well-written Forth, the word name gives no indication of what the word does to the stack. It's understandable then that you don't want a word taking 13 arguments, but once we get to the point where a valid argument could be made to have a word either take one item off the stack or two, we are stuck in the same morass where we can't tell what anything takes or returns.

astrobe_, I picked DrawTile1bpp as an example where Forth is especially unwieldy. Would you like to rewrite it as an example? Maybe if someone experienced like you took a swing at it, it would better reflect what Forth can do on the 6502.


> Someone measured the dispatch overhead for fetching the next word in FIG-Forth for the 6502 at over 80 cycles. STC only needs 12 for a JSR/RTS pair and potentially 0 if the word is inlined.

I have implemented STC with cod inlining for 8086 a long time ago, as well as DTC and bytecode, in assembly, C and Forth (sic), so I did my share of cycle counting.

Sorry if I project my younger self on you, but cycle count is not the best approach to factoring. Factoring in Forth is about spotting redundancies. This is a bit different from what people mean today by "(re)factoring", which is more about how one splits a task, a program, into functions or classes. This should be called "restructuring" instead. Refactoring in Forth is really a compression process, even when you could not care less about the size of your object code.

The best approach is to write your definitions the way you like, mostly ignoring speed and cycles and byte count. This should result in clean, elegant definitions. This first step reveals some short words like your @+1 that could be good candidates for an implementation in assembler. Some other words can be inlined using IMMEDIATE. Some others can be de-factored as a result of this refactoring-for-speed process. Doing it that would have made you realize t0_ppp is always 1, just like t0 is in the C version.

The Python basis for this port needs some optimizations to begin with. If I am not mistaken, all tiles are square. No need for a t_height or a t_width. No need for both in the tiles structure either.

The C port of the Python program can be improved. Tiles are defined, then there's a big array of pointers to those tiles. This is necessary because the tile data does not have a constant length, but it seems to me preferable to have an array of structures defining the geometry and colors of the tiles, and a pointer to the pixel data. The size is the same, just a different layout, but I expect that delaying the indirection from fetching the tile to fetching the data of the tile to be more convenient for the code.

Spotting unneeded complexity is harder than one would expect. Past the point of complexity being the consequence of laziness or lack of skill or lack of time, you find out that complexity also results from (bad) habits. You need to configure a bunch of paths for your application, you start writing routines to read them from an initialization file. Then after a while you realize you are using an interpreted language, so you could have sourced the configuration file directly, duh. I'm still making that kind of mistake after years of practice.

Simplification is a progressive process. Jeff Fox explains it better than I would in the third chapter of his essay on Forth [0].

> we are stuck in the same morass where we can't tell what anything takes or returns

I am puzzled to hear this complain from someone who can program in assembly. Forth and assembly are both un-typed, "no declaration needed" languages. What makes those properties acceptable in asm but not in Forth? Your expectations of Forth being a higher level language maybe?

In any case, all I can do is tell you that with practice, it is not as a big deal as you think, just like the weird symbols in APL are not a big deal, just like parenthesis in Lisp are not a big deal.

> Would you like to rewrite it as an example?

Sorry but no. I have no motivation to do that - I dislike coding challenges and prefer to code things that are actually useful for me - and nothing to gain from it.

[0] http://www.ultratechnology.com/forth.htm


> The best approach is to write your definitions the way you like, mostly ignoring speed and cycles and byte count. This should result in clean, elegant definitions. This first step reveals some short words like your @+1 that could be good candidates for an implementation in assembler.

Yikes! When I get into debates about this, writing part of the program in assembly usually comes up eventually. This is a good example of how inefficient the system is when you have to write assembly to do what would be *ptr++ in C. Another example is something like "swap 5 + swap" which burns all kinds of cycles. Someone recommended I rewrite this in assembly as well although it would only be something like "x+=5" in C. The absurdity of that speaks for itself.

>The Python basis for this port needs some optimizations to begin with. If I am not mistaken, all tiles are square. No need for a t_height or a t_width. No need for both in the tiles structure either.

Sure, but storing those as one value instead of two would also slightly speed up the C and assembly versions. Forth would still be just as slow and inefficient compared to the other languages if you made that change.

> The C port of the Python program can be improved. Tiles are defined, then there's a big array of pointers to those tiles. This is necessary because the tile data does not have a constant length, but it seems to me preferable to have an array of structures defining the geometry and colors of the tiles, and a pointer to the pixel data. The size is the same, just a different layout, but I expect that delaying the indirection from fetching the tile to fetching the data of the tile to be more convenient for the code.

No, the color data varies as well since some tiles have no color pairs and the rest have a varying number. Also, different color pairs can be applied to the same tile depending on the situation, so there is no 1:1 correspondence that would make your suggestion make sense here. In any case, adding one level of indirection to save memory and keep the system organized by passing an 8-bit index instead of a 16-bit pointer adds a few dozen cycles to a function that takes around 20,000 to draw the tile. Do you mean that if the C and Forth versions were reorganized, there wouldn't be such a large speed disparity between the two? That is highly doubtful.

>I am puzzled to hear this complain from someone who can program in assembly. Forth and assembly are both un-typed, "no declaration needed" languages. What makes those properties acceptable in asm but not in Forth? Your expectations of Forth being a higher level language maybe?

The original criticism is that you can't tell what anything takes or returns. If you look at the assembly, you'll see that the function calls there are just as clear as in C since we have named arguments and can instantly see what happens with the return value after the subroutine call. It's very telling that it's a lot easier to implement a usable scheme for local variables in assembly, primitive as it is, while Forth still lacks this.

> Sorry but no. I have no motivation to do that - I dislike coding challenges and prefer to code things that are actually useful for me - and nothing to gain from it.

Fair enough. How about an existing example you could point to then? There are a lot of unbelievable claims about the performance of Forth compared to C and assembly, but no one seems to be able to prove any of this. That's one of the main motivations for doing this project.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: