Hacker News new | past | comments | ask | show | jobs | submit login
A bug I won't forget (paulasmuth.com)
74 points by paulasmuth on May 28, 2012 | hide | past | favorite | 25 comments



"I hadn't noticed this before since the parser is written in a way that it will ignore everything that doesn't look like JSON."

and this is precisely why you want to fail hard if you encounter invalid input. Yes. It's annoying in the cases of "nearly valid" input or "valid input but with some garbage". Yes it's more work to deal with the error.

But it also means that something like this blows up before you end up in a "sometimes it works, sometimes it doesn't" situation.

Yes. I have been dealing with and even preferring the silently-failing kind of functionality, but over the years I've been bitten by it too many times to still being able to prefer it with a good conscience.

Overall you might still spend more time overall dealing with bitchy libraries, but at least you will hopefully never have to deal with bugs that happen only sometimes as those are really hard to track down and fix (if it's at all possible).

Sure. Sometimes you can get away with "yea - it fails at times - that's an unfortunate fact of life", but the moment that issue which only rarely appears costs the customers or your money, it all becomes really important and "it fails at times" just doesn't do. Of course, by then, the problem needs to fixed right then - which just doesn't go very well with "it usually works".

At that point you spend the hours it takes to track the problem down and you will curse your decision to fail silently once.


I'm currently having issues with em-http-request, resque and resque-retry. It's still sometimes dropping work before the retry limit and not behaving nicely with the retry timespans. Also the async http request is not using the timeout value, randomly...

It only happens with big traffic, like 0.01% just fails in a wrong way. It's not much, but still it's our and our customer's money. I hope our get together to solve this problem helps tomorrow.


I've noticed the same issue with that gem. It only happens randomly and with high density asynchronous traffic.


I can argue this both ways, mainly based on who I focus on.

If I focus minimizing pain for the end user, I want things to blow up as little as possible. If I focus on minimizing my pain, I tend to go for hard failure.

For things that are important enough to spend the time on, I go for both: a system that is maximally kind to its users, but is internally a fussbudget. But that requires building a decent infrastructure for logging and alerting, plus an organization disciplined enough take the alerts seriously.


This bug reminds me of what my dad taught me when I got my driver's license. He taught me that knowing how to drive carefully wasn't enough to prevent accidents. I had to drive for myself and for everyone else. I clearly remember "you don't know what kind of drunk will be blowing a red light".

In this bug, Paul was driving carefully - he relied on the parser to do a good job. But relying on the parser is like crossing in GREEN light without checking. 99.9% of the time you should be OK. Until that one time with the drunk blowing the RED light.

My dad was right. Had Paul not relied entirely in the parser and done accurate memory allocation (checked for that drunk blowing the RED light) - everything would have been fine.


Sidenote: I'm pretty sure the author is Paul Asmuth, not Paula Smuth.


Ah, thanks.

I wonder how many people even know that dashes are legal in DNS names. (I mean, of course, the ASCII character that serves as hyphen, en-dash, and minus sign.)

I think there are lots of domain names that would benefit from a well-placed dash -- the most amusing example I've seen being Pen Island's.


Expert Sex Change. (Familiar to almost everyone here, I think.) Power Genitalia. (An Italian battery company.) Whore Presents. (A service for finding out about people's publicity agents, etc.)


It's a pity I can't read this blog post. On an iPad it specifically prevents me from zooming and the font is too small.

Please don't use "ipad-specific" or "mobile" themes. They break the web.


Zooming works fine here, ipad3, safari.


Would it be awfully smug to point out that Valgrind would've pointed this bug out in mere minutes? That's exactly why I make a habit of running my tests under Valgrind regularly during development; there's no point wasting hours debugging the classes of problem that tools can pinpoint in minutes.


Tangent -- it's crazy to me that println passes as debugging.


So, wait. When you allocate a new array in the JVM, it's filled with random data instead of zeroes? That seems like a fundamental security model error. Or are these 'buffers' special native IO primitives that break all the Java security rules and guidelines? I haven't used Java in a while...


Heh no, this was the actual bug (that it was reading "random" data from memory on the first iteration). I just hadn't noticed the issue until this "random memory" contained fragments of invalid json.


Were you or the network library reusing buffer objects (to avoid reallocating them), so the random data was leftover from an early socket read? I'm surprised the JVM would allocate a new buffer object with non-zero data.


Yeah, he was almost certainly reusing his byte[]s. My takeaway is that if you program a high-level language as if it's C, expect C-like bugs.


Likely reusing his directly allocated ByteBuffer and not checking the number of bytes that he filled it with.

From what I remember, directly allocated ByteBuffers are not guaranteed to be zeroed.


Could you elaborate on this? If the random data isn't caused by the JVM not filling a buffer with zeros (which I'm sure it does) how is the data actually leaking? Do you share byte[] arrays between threads or recycle them once a thread dies?


JVM objects are always zero-allocated, but Java libraries typically don't make any guarantees about memory that sits outside the JVM heap, which is probably where the socket was reading from.


By default, the JVM initialises arrays as appropriate for their type. Presumably this was an array of String, so the initialisation values are nulls, not zeros. Arrays of boolean are initialised to false, etc.


We don't know anything about it, because the blog post is slightly meager..

That said, this reads more like a byte[] array or similar to me, since you are reading data from the net/a stream. Somewhere there will be a process to interpret these bytes as a string in a specific encoding, but the error 'sounds' like being related to the raw buffer of power of 2 size bytes.


Just out of curiosity: How much Scala are you using at DaWanda? And for what use cases?


We currently use it extensively for analytics, our product recommendation engine (blog on that soon!) and we are writing a custom scala based http proxy for our new API. In general, we are trying to progessively do more scala and less ruby.

If you (or anybody else) is interrested in hacking with us, please drop me a note (link to your github profile is enough) at paul@dawanda.com :)


Oh, Paul. ;( Less Ruby??! What happened? Are you angry at her? Did she cheat on you? Did you catch her in bed with Mikael? I warned you about him.


Oh, Don. No, everything is fine with ruby, I just got bored. And scala... she is so much faster! See you soon in AMS; I'll bring Mikael for fun and profit ;)




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: