My favorite example of a "this should never happen" error was when I got a call ...

return0 · on March 31, 2016

Did you tell brian?

FiatLuxDave · on March 31, 2016

Of course I did. He didn't remember putting in that error condition, but he loved the story!

viperscape · on March 31, 2016

He doesn't remember? That's scary hah. Seems like something worth remembering. Great story, made me laugh

sliverstorm · on March 31, 2016

Do you remember all the trap code you've ever written?

alttab · on March 31, 2016

Traps are for the insecure. I just write to memory without checking bounds or asking questions. No one has ever tried to call me.

amagumori · on March 31, 2016

"Hi alttab, i tried to use your program but it closed and displayed a message saying 'segregation fault' or something...i'm not a racist, i love all people, please give me a call back"

vitd · on March 31, 2016

I saw one like this once. Back in the early 90s I was working at a computer lab at my university. We had just gotten in a 300MHz DEC alpha, and that thing was a screamer! It was so fast that X-windows didn't feel slow on it! (And this was in the day of 25-50Mhz 386s and 486s.)

I was compiling some tiny test program on it, and it spit out an error message that said something to the extent of "This shouldn't happen. Email Dave and tell him what you did - david<something>@digital.com." I ended up forwarding it to our IT department whom I assume sent it on to DEC. I don't know if Dave ever saw it or not, though.

FigmentEngine · on March 31, 2016

Dave Cutler?

vitd · on March 31, 2016

Could be. I never got to find out because, as I said, I sent it off to our IT department.

fit2rule · on March 31, 2016

I remember getting this message myself back in those days, on my brand-spanking new DEC Alpha, which shipped with a 'pre-beta' compiler to those of us who were avid recipients of DEC's first batch of Alpha workstations in anticipation of a strong porting effort to get away from the "MIPS situation" at the time .. heady days indeed!

zodPod · on March 31, 2016

Yeah, honestly, as a one incidence sort of thing, this sounds awesome haha. You could search the code for it, find the relevant piece immediately, and the user was prompted to call you guys quickly to get it resolved!

z3t4 · on March 31, 2016

It's a good idea to haw a check for memory corruptions at regular intervals to keep your sanity.

brianwawok · on March 31, 2016

How do you do that? I get on bootup you could do a little diddy, but how would you know if random bits are getting flipped? Seems tricky for an embedded device...

gerbilly · on March 31, 2016

Not quite for memory _corruption_ but back when I was writing API code in C, I would place 'sentinels' at each end of my structs.

  struct somestruct {
    int s1;
    int data;
    char * moreData;
    int s2;
  }

When the caller of the API needed to call my code, it had to first call a function to get an instance of the struct. This constructor like code would allocate the memory for the struct, and then set s1 and s2 to 0xDEADBEEF;

The user would then fill out the rest of the struct and pass it back in as an argument to another call.

If either s1 or s2 wasn't 0xDEADBEEF, I would throw an error to the caller.

I helped me catch a lot of cases where the caller to the API had overrun some string while filling out the inputs.

Negitivefrags · on March 31, 2016

This reminds me of something a friend of mine did once.

He had a structure that was getting overwritten with garbage due to an overrun somewhere else in the code. Rather than debugging and trying to find out what was doing it he just put "char temp[1000];" at the top of the struct to "absorb the damage".

I believe it's still running like that in production to this day.

gerbilly · on March 31, 2016

> Absorb the damage

That's funny.

The code above got written that way because at my first job, I inherited a godawful business charting API written by the lead developer.

The input to the API was a struct with 70-80 members that the caller had to fill in and there were no defaults for anything! Naturally there were not just scalars, but lots of arrays and strings in the struct, which could easily be overrun or often left null.

The users, quite understandably, didn't fill out everything, which led to frequent crashes in _my_ code because that's where the pointers would get dereferenced.

When they would see that the crash was not in their code, the users of the API would punt the error to me even though it was their bad input that caused the problem. This would happen 10-12 times a day.

I rewrote the entire thing in a paranoid style , employing the trick above and others to try and ensure that if there was bad input, that it would always crash on their side of the fence.

After I was done I got one legitimate bug report for the code, even though it was in use worldwide in our medium sized company.

brianwawok · on April 6, 2016

This is kind of an extension of the "throw more hardware at poor performance"... but its throwing more bytes at bad code ;)

mayank · on March 31, 2016

Neat trick. Add a 'crc' field after 's2' and you just made it work for memory corruption too.

chopin · on March 31, 2016

This one is compelling.

However this might not have caught the error condition described upthread. That condition might have overwritten data or moreData without touching s1 or s2.

Otherwise, great!

onetimePete · on April 1, 2016

Reminds me of a stack canarys, stuff you put on the feet off a stack and check with the scheduler.

Also to all those ready to do a checksum on a struct, rememember that structs are plattform dependant (padding bytes).

agoetz · on March 31, 2016

SW Solution:

1. Store embedded system state in data structure.

2. Calculate a checksum for that data structure.

3. Verify that checksum is correct.

HW Solution:

Lockstep Execution/ECC memory, etc.

julie1 · on March 31, 2016

ECC + checksum have a slight flaw.

If too many errors happens, the checksum can be correct even though the content is corrupted.

Hum... I know what you think: ThisShouldNeverHappen

When exploited by human it is called a collision attack. Works pretty well, so many people trust but never check.

mnicky · on April 1, 2016

Seems like Brian :) https://github.com/OpenLiveWriter/OpenLiveWriter/commit/1236...

mchahn · on March 31, 2016

I often code errors that say to contact support. I have assumed it would help get problems reported more often.

devgutt · on March 31, 2016

You should have kept that message. Much more authentic and refreshing ;)