Hacker News new | past | comments | ask | show | jobs | submit login

My favorite example of a "this should never happen" error was when I got a call from a customer, who started the conversation by asking, "Who is Brian?".

I was caught a bit off guard, but I assumed the customer must know someone at the company, since Brian was the name of the previous electrical engineer/firmware programmer. So, I told them that Brian didn't work here any more, but was there anything that I could help them with? The customer said, "Well, the device says that I should call Brian". I was confused by this, and asked a lot of questions until I determined that the device was actually displaying "CALL BRIAN" on the LCD display.

This was quite unusual, and at first I didn't believe the customer, until he sent a picture of the device showing the message.

So, I dug into the code, and quickly found the "Call Brian" error condition. It was definitely one of those "this should never happen" cases. I presume that Brian had put that in during firmware development to catch an error case he was afraid might happen due to overwriting valid memory locations.

I got the device back, and found out that the device had a processor problem (I don't remember exactly what) that would write corrupted data to memory. So, really, it should never happen.

That particular device has now been in production for 10 years, and that is the only time that error has ever appeared.




Did you tell brian?


Of course I did. He didn't remember putting in that error condition, but he loved the story!


He doesn't remember? That's scary hah. Seems like something worth remembering. Great story, made me laugh


Do you remember all the trap code you've ever written?


Traps are for the insecure. I just write to memory without checking bounds or asking questions. No one has ever tried to call me.


"Hi alttab, i tried to use your program but it closed and displayed a message saying 'segregation fault' or something...i'm not a racist, i love all people, please give me a call back"


I saw one like this once. Back in the early 90s I was working at a computer lab at my university. We had just gotten in a 300MHz DEC alpha, and that thing was a screamer! It was so fast that X-windows didn't feel slow on it! (And this was in the day of 25-50Mhz 386s and 486s.)

I was compiling some tiny test program on it, and it spit out an error message that said something to the extent of "This shouldn't happen. Email Dave and tell him what you did - david<something>@digital.com." I ended up forwarding it to our IT department whom I assume sent it on to DEC. I don't know if Dave ever saw it or not, though.


Dave Cutler?


Could be. I never got to find out because, as I said, I sent it off to our IT department.


I remember getting this message myself back in those days, on my brand-spanking new DEC Alpha, which shipped with a 'pre-beta' compiler to those of us who were avid recipients of DEC's first batch of Alpha workstations in anticipation of a strong porting effort to get away from the "MIPS situation" at the time .. heady days indeed!


Yeah, honestly, as a one incidence sort of thing, this sounds awesome haha. You could search the code for it, find the relevant piece immediately, and the user was prompted to call you guys quickly to get it resolved!


It's a good idea to haw a check for memory corruptions at regular intervals to keep your sanity.


How do you do that? I get on bootup you could do a little diddy, but how would you know if random bits are getting flipped? Seems tricky for an embedded device...


Not quite for memory _corruption_ but back when I was writing API code in C, I would place 'sentinels' at each end of my structs.

  struct somestruct {
    int s1;
    int data;
    char * moreData;
    int s2;
  }
When the caller of the API needed to call my code, it had to first call a function to get an instance of the struct. This constructor like code would allocate the memory for the struct, and then set s1 and s2 to 0xDEADBEEF;

The user would then fill out the rest of the struct and pass it back in as an argument to another call.

If either s1 or s2 wasn't 0xDEADBEEF, I would throw an error to the caller.

I helped me catch a lot of cases where the caller to the API had overrun some string while filling out the inputs.


This reminds me of something a friend of mine did once.

He had a structure that was getting overwritten with garbage due to an overrun somewhere else in the code. Rather than debugging and trying to find out what was doing it he just put "char temp[1000];" at the top of the struct to "absorb the damage".

I believe it's still running like that in production to this day.


> Absorb the damage

That's funny.

The code above got written that way because at my first job, I inherited a godawful business charting API written by the lead developer.

The input to the API was a struct with 70-80 members that the caller had to fill in and there were no defaults for anything! Naturally there were not just scalars, but lots of arrays and strings in the struct, which could easily be overrun or often left null.

The users, quite understandably, didn't fill out everything, which led to frequent crashes in _my_ code because that's where the pointers would get dereferenced.

When they would see that the crash was not in their code, the users of the API would punt the error to me even though it was their bad input that caused the problem. This would happen 10-12 times a day.

I rewrote the entire thing in a paranoid style , employing the trick above and others to try and ensure that if there was bad input, that it would always crash on their side of the fence.

After I was done I got one legitimate bug report for the code, even though it was in use worldwide in our medium sized company.


This is kind of an extension of the "throw more hardware at poor performance"... but its throwing more bytes at bad code ;)


Neat trick. Add a 'crc' field after 's2' and you just made it work for memory corruption too.


This one is compelling.

However this might not have caught the error condition described upthread. That condition might have overwritten data or moreData without touching s1 or s2.

Otherwise, great!


Reminds me of a stack canarys, stuff you put on the feet off a stack and check with the scheduler.

Also to all those ready to do a checksum on a struct, rememember that structs are plattform dependant (padding bytes).


SW Solution:

1. Store embedded system state in data structure.

2. Calculate a checksum for that data structure.

3. Verify that checksum is correct.

HW Solution:

Lockstep Execution/ECC memory, etc.


ECC + checksum have a slight flaw.

If too many errors happens, the checksum can be correct even though the content is corrupted.

Hum... I know what you think: ThisShouldNeverHappen

When exploited by human it is called a collision attack. Works pretty well, so many people trust but never check.



I often code errors that say to contact support. I have assumed it would help get problems reported more often.


You should have kept that message. Much more authentic and refreshing ;)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: