For a launch product of a certain console I had a nasty bug report from QA that took 20+ hours to reproduce. Finally (with 24 hours left to go to hit console launch) tracked it down to some audio drivers in the firmware that were erroneously writing 1 random byte "somewhere" at random times where the "somewhere" was always in executable code space. I finally figured out that any given run of the game that "somewhere" was always the same place, luckily. 1st party said sorry, can't fix it in time as we don't know why it's being caused! So I shipped that game with stub code at the very start of main that immediately saved off the 1 byte from the freshly loaded executable in the place I knew it would overwrite for that particular version of the exe. There was then code that would run each frame after audio had run and restore that byte back to what it should be just in case it had been stomped that frame. Good times! We hit launch.
To this day I still feel very very dirty about this hack, but it was needed to achieve the objectives and harmed no-one :)
We once found that the floating point precision bit could change at unspecified times in our game. This destroyed any hope of deterministic computations for networked lockstep games or game replays. We tracked this to some driver or dll hooking into the full-screen switching process for DirectX (could be a mouse driver, or a gfx driver, or who knows what else). It only seemed to happen in pre-NT versions of Windows.
We just forcefully reset the bit after any call to the Windows message pump, and that was the last of our network desyncs gone for good. Ship it!
(I recall DirectX later added some flag related to this, but memory is hazy)
There are many neat tricks for branchless algorithms. Once you get a feel for what makes them work, they're easy to invent on your own. Two representative examples:
// branchless octree traversal
i = x <= t->mx;
i <<= 1;
i |= y <= t->my;
i <<= 1;
i |= z <= t->mz;
t = t->child[i];
// unrolled branchless binary search of a 64-element array
if (p[32] <= x) p += 32;
if (p[16] <= x) p += 16;
if (p[ 8] <= x) p += 8;
if (p[ 4] <= x) p += 4;
if (p[ 2] <= x) p += 2;
if (p[ 1] <= x) p += 1;
return p;
(In the binary search, you'd implement the if statements with conditional moves or predicated adds, depending on what your platform offers.)
Both of these examples replace branching with data-dependent array indexing, which is a recurring pattern.
Branchless algorithms are often a performance win on consoles and embedded systems. But they really shine when you're writing code for a wide-SIMD architecture with gather-scatter support, where divergent memory accesses are generally much less costly than divergent branches, GPUs being the most widespread example.
> (In the binary search, you'd implement the if statements with conditional moves or predicated adds, depending on what your platform offers.)
It's usually a bad idea to use arithmetical trickery in an effort to get the compiler to generate conditional moves or predicated instructions. If you want to make sure the compiler does the right thing, use macros that are conditionally expanded to the appropriate intrinsics on each compiler/platform.
Since I didn't want to get into that, I used if-statements with the above caveat. Makes sense?
I admit I've overlooked the caveat while posting the response yet even with the caveat this does not make much sense, as every code can be turned into branchless when you use predicated instructions (a bit less so with conditional moves). This is what GPUs do often (many architectures don't have branch instructions so all branches are always executed with predication).
> I admit I've overlooked the caveat while posting the response yet even with the caveat this does not make much sense, as every code can be turned into branchless when you use predicated instructions (a bit less so with conditional moves).
Now you're just being intentionally obtuse. When I write if (x) y += z and say that it should be implemented with conditional moves or predicated adds per your platform, there really isn't much room for confusion.
> This is what GPUs do often (many architectures don't have branch instructions so all branches are always executed with predication).
Actually, all modern desktop-class GPU architectures have branch instructions. They only revert to predication (which in NVIDIA's case is implicit, i.e. not controlled by microcode-visible mask registers, though they of course also have CMOV-like instructions) when the different SIMD lanes take divergent branches. That's been the case since the G80/Tesla microarchitecture, which the GeForce 8800 from 2006 was the first product to use. Mostly-coherent dynamic branching is heavily used in modern shader code, to say nothing of GPGPU code, and is almost free. Incoherent branching is the big issue. Replacing that with incoherent memory accesses using branchless code can be a huge win, even though incoherent memory access is far from free.
>Now you're just being intentionally obtuse. When I write if (x) y += z and say that it should be implemented with conditional moves or predicated adds per your platform, there really isn't much room for confusion.
May be I am obtuse though not intentionally. If you wanted to say that if's can be replaced with predicated instructions why such a convoluted example (which is a nice piece of code, by the way)?
>Actually, all modern desktop-class GPU architectures have branch instructions.
Indeed, and as you say, they are not always executed because a single SIMD core executes threads in a lock-step and it's only possible to branch when all the threads yield the same condition. Besides there are still GPUs without branches, e.g. PS3 fragment shaders.
> If you wanted to say that if's can be replaced with predicated instructions why such a convoluted example (which is a nice piece of code, by the way)?
That's not what I wanted to say; it was a side note to a piece of code. Anyway, sorry if it was confusing. It was a spur-of-the-moment comment on Hacker News, not a carefully wrought essay.
> Besides there are still GPUs without branches, e.g. PS3 fragment shaders.
The RSX is ancient at this point, a turd in a box. It was a turd even at the time when, in the eleventh hour of the PS3 project, Sony begged NVIDIA to give them a GPU after their fantasy of Cell-based rasterization had proved ludicrous. It's so bad that you have to offload a ton of work on the SPUs to get acceptable graphics performance.
I worked on the port on Metal Gear Solid to PC from Playstation 1. Released 2000 by Microsoft.
On PSX every triangle you draw is with screen integer coordinates, this creates a lot of "shaking" which was okay for the regular PS1 user, but when such game is ported to the PC it looks even worse.
What I did was a global array "float numbers[65536]" that kept for every major GTE function (matrix rotation, scaling, vector multiply, etc.) a better precision number. For example if after projection of coordinate 123.434 to screen, it would write numbers[123] = 123.434 (can't remember whether I used float or fixed).
So later when triangles would draw, if I had to draw triangle from X or Y coordinate of 123 I would reuse the number 123.434
Now this is not accurate, but good enough - after all not many things end up being on screen at coordinate 123, and most likely they would've been the same calculated coordinate but for different tris... in fact it might've helped sticking together certain things...
I dunno - but the shaking effect was gone :)
There was a lot to learn from Hideo's team too - We had an artist who doubled all textures (eyes, clothes, etc.) - Hideo specially forbid the better looking eyes. He said that they specifically blurred all eyes, because they could not put eye animation in the cut scenes - this way, because they are blurred you can't really see where the eyes look at it - hence preventing the Uncanny Valley.
Another trick from Hideo's team was storing in one of the pointer bits game play related info. On PSX if one of the bits of a pointer was up, it was pointing to the same memory, but with uncached access. So they had pointers to C4 bombs, that if were planted to the ground had the bit off, and when planted to wall the bit on.
They also had a nice solving for T-junctions (they do happen a lot with geometry). Rather than drawing extra polygon to fill, they were stretching a little bit the polygons so they overlap by pixel or two.
The "rainy day memory pool" is interesting, I've run into something similar in the web-programming context.
I used to work for a company that had a horrific hardware requisition policy. If your team needed a server, it had to go through a lengthy and annoying approvals process - and even then, it took months before Infrastructure would actually provide said servers.
In other words, when a project gets handed down from above to launch in, say, 3 months, there's no way in hell you can get the servers requisitioned, approved, and installed in that time. It became standard practice for each team to slightly over-request server capacity with each project and throwing the excess hosts into a rainy day pool, immediately available and repurposeable as required.
New servers will still get requested for these projects, but since they took so long to approve, odds are they'd go right into the pool whenever they actually arrived, which sometimes took up to a year.
Of course, it was horrifyingly inefficient. Just on my team alone I think we had easily 50 boxes sitting around doing nothing (and powered on to boot) waiting to pick up the slack of a horrendously broken bureaucracy.
It was basically standard in the industry for everyone to work weekends and nights for a month before release. It's calmed down perhaps and more people are hourly so at least they get paid for it, but it's still expected that you will put in significant weekend and evening hours before a milestone.
One issue is that a lot of the games don't seem to develop well under more Agile practices. Games are very large projects and it can be hard to build them incrementally, so they always end up with the standard software-management over-budget and deadline slips.
At least, this is what I'm told by my wife and friends in the industry. I still believe that their requirement of overtime is due to poor management, pressure by execs, poor specs, and inflated goals.
> I still believe that their requirement of overtime is due to poor management, pressure by execs, poor specs, and inflated goals.
It's only a requirement because game developers have tended to put up with it. Especially in the earlier days, game development attracted people who were extremely passionate about making games. This meant they were willing to put up with a lot of shit. You don't have to ascribe much malice or incompetence to companies to explain crunch time under those conditions--it would be more surprising if it hadn't happened. Fortunately, things have changed for the better, although there's still a long way to go.
I'm fine with crunch time so long as (1) it is surgically applied, (2) employees know what they're getting into, and (3) employees are rewarded for going above and beyond the call of duty.
Because the camera was using velocity and acceleration and was collidable, I derived it from our PhysicalObject class, which had those characteristics. It also had another characteristic: PhysicalObjects could take damage. The air strikes did enough damage in a large enough radius that they were quite literally "killing" the camera.
Um, no? If their camera wasn't a PhysicalObject, they'd have had to write custom code for handling the camera case for anything that handles PhysicalObjects. All this means is that perhaps a PhysicalObject shouldn't be the thing that takes damage.
The reason is that inheritance is too coarse-grained and promotes bundling together lots of unrelated concepts. So it's likely that with a finer grained system, they would have separated physics and damage in different classes. They might have still made the mistake of not separating them, or of copy-pasting the damage object into the camera anyway, so the point and punchline of the story still stands.
They might have still made the mistake of not separating them, or of copy-pasting the damage object into the camera anyway, so the point and punchline of the story still stands.
The possibility that a programmer could make a mistake in the implementation of a paradigm is not an argument against that paradigm. The first part of your comment is very real and very valid, but copy-pasting is a potential issue regardless of your programming language.
I am convinced that this is not just a programmer's mistake. This is an illustration of the problem with the paradigm itself. Inheritance makes you want to glue unrelated things together.
Inheritance views the world as fundamentally hierarchical; you can divide things up into branches of a tree. A node on the tree either has all of a set of functions or none of them. Any code you put into Reptile will never be needed by a Mammal, and will never be inappropriate for an object that needs other things in Reptile.
But of course, that's unrealistic. Where would you put swim()? Where would you put fly()? Code reuse can't always be decomposed into a tree. And in an inheritance paradigm, you get ugly workarounds in that case -- cut and pasted code, nasty multiple inheritance hacks, or util classes. All bad things.
What you want are things coupled, not by convenient proximity in current function, but by logical relationship. You want a class that has swimming-related code and nothing else, that any swimmer can use without preconceptions about whether it's a Reptile or a Mammal. You want, in short, Traits.
Inheritance is not a design methodology, it’s a tool. As a tool it does not “view the world”, unless it’s meant in the sense that when you only have a hammer, every problem looks like a nail to you. When you only have inheritance in your toolbox, things are going to end badly indeed. But that does not mean that inheritance is a bad tool.
Actually I want multiple inheritance. That way I can write that a crocodile is a reptile and a swimmer or that camera is a game object and it is collidable but not damageable.
Yes it can make your code more complex, but you can also use it to make the game more clean.
I think you should be very careful about using this kind of argument. This is the mindset of a cult. Why didn't X work? well you just didn't use enough X! . I have kind of a visceral reaction of disgust when this is used, because it encourages the retention of silly ideas without really using reason, logic or evidence to defend them.
It's only the mindset of a cult when the theory in question is not demonstrated to work or is not predicated on reason. Suppose I wrote a catapult simulator to test the validity of designs for a competition, and then my simulator ignored physical attributes such as wind speed, material strength, elevation, and weight. The physics being used have been known by the modern world for centuries, but I haven't applied enough of that theory to create a valid simulator.
I do not make any similar claim about the validity of object-oriented programming, but I find your argument as a reaction to my comment rather silly. People make mistakes in paradigms all the time. Read The Daily WTF is you want to observe how badly people can mangle code. I'm not suggesting that all your problems with OOP can be solved by using more OOP, just as I'm not saying that the chance hat a programmer will misunderstand for-loops in Lisp means that we should throw out functional programming.
Your physics example is not a good analogy, because your critique of the simulation is not that the hypothetical didn't use enough physics, but instead you point out a number of very specific physical attributes that were missing, and you would have been able to explain the effect each of those attributes would have on the simulation. This is evidence and reason based argument. This is NOT like the argument you used for object oriented programming.
I don't know why you think I'm defending OOP. I'm not. My original point was that copy-pasting is not an argument against any programming paradigms. The possibility of programmer error is not a valid argument against using a programming language, just as a person being unskilled at spelling is not a valid reason to criticize English, Mandarin, French, etc.
That's why you use delegation and traits instead of inheritance. sometimes the alternative to a bad idea isn't the even stupider thing you did before you found out about it.
FYI, the game was 'Force 21', "a Real-time Strategy game made by Red Storm Entertainment in 1999 that features a storyline which has US forces fighting PRC (Chinese) forces in the year 2015."
"You view the game mainly from the Tactical 3D window. The camera is, sadly, always situated just above the lead vehicle in the selected platoon. A free roaming camera would have been better as it would have given you more opportunities to view the action. Still, the camera is very useful and really succeeds at drawing you into the action. The camera is very, very easy to use--just bump the edges of your screen with the mouse...well, the mouse cursor, not the actual mouse itself. The camera then spins around the lead unit of the platoon. Vertical movement isn't as good. The angle of the camera doesn't allow you to look straight down on your units or get down on the ground beside them. Oh well, c'est la guerre. There's also a Strategic view that's better for getting an overall impression of the terrain elevation, but it's no good for seeing the action. For the overall impression of the battle, I preferred to use the tiny strategic window in the corner of the screen."
I can confirm the first trick, reserving a block of memory - I learned this in the early 90s coding on the Mac as the "rainy day fund" memory allocation.
I haven't seen it with memory, but I work with an graphics programmer who swears he's done the same trick with CPU cycles, just insert a busy loop that does nothing for a couple of milliseconds every frame (busy loop, not sleep, so it still appears to the casual observer that your rendering engine is working as hard as it can). Then when the game is getting ready to launch, and the framerate is inevitably dropping, remove the loop.
I don't think he has a loop like that in our current code base. But it's possible.
Another method occurred to me that might sort of work on the PC would be to finish a game, then delay the release for 1 year or so, during which time (due to Moore's law), everyone's gfx cards & processors have gotten faster and hence the average user's game playability would be improved.
I think the bigger titles already account for the hardware advancements and aim for beefy, top-of-the line machines that are going to be common in the several years it takes to develop the title or engine.
Although there is still something to be said for having a low memory footprint, even on modern hardware. If you can fit all of your resources reasonably into memory, then you don’t need to have any load delay, or worry about asynchronous loading. Half the reason retro games are so fun to make is that you can easily do everything you always wanted to do on the Super NES or the Genesis or whatever.
The first one can't be true actually. It might make the rest of the team all cheerful and happy when you did this and 2 days before the release you call them together and show them, but it doesn't actually help a bit.
It seemed in this first point that nobody really gave memory usage a lot of thought ('write the solution first, optimize second'), and thus they were way over the limit. Let's say the goal was 120MB and they were at 160. Then they started compressing and optimizing everything, and after a while they got very close; 121.5MB. So the experienced programmer removes the 2MB allocation and saves the day.
If the experienced programmer hadn't done this, I don't think (seeing how much they cared about the memory usage before) the memory usage would have been more. They might have been at 158.5MB before optimization as well, and gotten under the limit with the same optimizations as they already had to do now.
So as far as I can see, it seems the only value of doing this is the psychological value. Might still be worth something, though!
Sure it can, it happens all the time in fact. Many console development kits have more memory available for development purposes that isn't available on retail units. This is hugely beneficial in development because it means you can actually run a build that has asserts enabled, or you can use special memory allocators that pad allocations for debugging, etc. I have seen, and been involved with, the mad scrambles to bring memory footprint down as a project edges toward completion.
It's not as fatal to blow your memory budgets on PC as it is on a console but if you're trying to hit a certain memory footprint so that the game is playable on the minspec machines defined by your publisher then it could very well be an issue.
Not only is it true, in fact I have never seen a project that didn't do this in one way or another, and I've been making games for 25 years. You need to have some spare room for late-minute, unforeseen issues.
The doing it in secret and the cheering in this story are quite funny though (and yeah I have seen those as well).
As someone who's read the gamasutra article before. This was really annoying to read because of the new content being mixed in with the content from the gamasutra article, without anything to distinguish what was new and what wasn't.
I did not have an issue with having them all on one page. I had an issue with not grouping the gamasutra ones together. Instead they've been mixed. ie, nr 1, 4 and 6 are from the gamasutra article, while 2, 3 and 5 were new.
I think #15 - attacking a bug directly instead of going about it with patches (although possibly there could be a fine line between "direct' and using patches) can be especially relevant for anyone (even outside of games).
I think the 'best' trick I've seen is using pointer tagging on an object's virtual function table pointer to squeeze in an extra flag during garbage collection.
Adding another variable was thrashing the cache, so instead the GC would tag the VFT pointer (making it unusable obviously) and then untag it before GC ended, fixing the object.
I wasn't sure if I should be horrified or applaud when I found out about this.
I can't recall the exact details, but just days or maybe a week before gold master of one title I worked on, there was a case where an object's virtual table was getting munged somehow. I do remember I managed to figure out exactly what was happening. But the amount and type of work it would take to fix would have likely delayed shipping.
So I wrote some code that constructed a new object of the same type on the stack, then
I once used a similar trick combining C++ placement new and multiple classes to mimic several possible values of refcounters to simulate reference counting without a data member (for instance the object is created at C1, AddRef news it in place to C2, then C3, then Release news it again back to C2). When you have only a few possible values of refcounters and the object reuse (for instance if these represent oft-used values) warrants it, this can be used to save memory...
Just a warning to programmers implementing the first technique: Time your release of that memory carefully, and coordinate with other programmers.
Your designers and artists know about the technique and grudgingly accept it BUT having multiple different programmers independently come up having "found" some memory AFTER I spent all day cutting everything from my level instead of fixing bugs...
We cut megabyte after megabyte, and after a few days of frantic activity, we reached a point where we felt there was nothing else we could do. Unless we cut some major content, there was no way we could free up any more memory. Exhausted, we evaluated our current memory usage. We were still 1.5 MB over the memory limit!
This made me laugh, as the first PC game I worked on had system requirements of 570K of RAM and 2M of hard drive space. (That was in 1992, and he's talking about a "late-90's" title, so things had already changed a lot.)
(To be honest, I just found the system requirements by looking them up online now; I don't remember them myself, and my personal copy of the game is at work. I'll come back here and correct it if I was wrong.)
One of these tricks is that you should fix 'the root of the problem' while a lot of the other tricks are hacks to ship the game on time. I'm a bit confused.
His patches comment was basically early on we realized the collision detection code was horribly broken so we just started patching every edge case we could find. That's practically an endless treadmill. On the other hand when there is a vary specific problem really late in the production cycle that has little do do with the rest of the game then you can just patch that specific problem and ship the code. When an audio driver you have no control over corrupts a single bit in your EXE that is the root problem not the symptom.
The first one reminds me of a story I was told by my embedded engineer friends. They had just been recently hired to maintain this video harddisk storage application, and had found it to be horrifyingly slow. Looking into the code it was found that the core sorting was little better than bubble sort (and the data was on disk not in memory), as such it was quickly reprogrammed to be a merge sort.
Oddly people started to complain that the application was _too fast_, after much arguments (many of these being management thinking that, because it was too fast it was not working properly) they came upon a solution.
The solution in all its glory was a loop at the end of the sort, with a sleep in it.
The best thing was, that when they were feeling lazy they would put out a release that was, for example 33% faster - by reducing the sleep time.
To this day I still feel very very dirty about this hack, but it was needed to achieve the objectives and harmed no-one :)