Ask HN: How do people really test their code?

chipsy · on Aug 30, 2010

I'll try to answer the general question.

It's kind of a quantum physics problem, at least with respect to testing in general. Code that is completely untested hides all errors, and code never put through its paces won't show a performance problem. As it gets closer and closer to the "real world" (compiling, initial tests, beta, 1.0, 2.0, etc.) bugs and scaling issues will show up(often somewhat "magically"). The best you can do to counter this is simply to make the most efficient use of your time given the quantity/complexity of features, schedule, and available manpower - so that the remaining time can go towards a thorough test cycle.

Code built within restricted computational models(stronger type systems, garbage collected memory, functional-style code, relational logic...) can eliminate entire classes of errors. This doesn't eliminate the benefits of tests, but it makes it possible to focus your tests on a smaller subset of all errors.

Code with extensive ongoing review processes(e.g. space shuttle code, or perhaps the Linux kernel) can eliminate a different class of errors from regular tests or restricted models, because it uses the power of human minds to reason through the concepts repeatedly; a mistake made by one programmer is not likely to be repeated in exactly the same way by ten or twenty of them.

Also worth consideration is test scaffolding and debugging tools. In a large codebase, errors can appear farther and farther from their origin. This leads to a "test suite" (unit tests, functional tests, example datasets) run more-or-less independently of the application. For some kinds of applications, relatively elaborate debugging features may be necessary to display and step through core data structures while the app runs. Debugging-related features are easy to overlook, but are often well worth the time spent, and I have taken towards adding them whenever I encounter a class of bugs that they would help address, rather than to just muddle through the first instance and say "hope THAT doesn't happen again!"

Also, to a large extent, language and environment dictates debugging methods - C code benefits from a machine-level debugging system like gdb, but in languages with runtime reflectivity like Python or Ruby, you rarely need more than a print statement to uncover a problem. If you are working with an embedded device instead of a desktop OS you may have a remote monitor system or an emulator. If you're working on a webapp, you have server logs and browser-level tools. Et cetera.

Bluem00 · on Aug 30, 2010

In order of decreasing importance, I recommend you: Constantly check that you're building the right thing. Get some kind of testing framework. Look into Test Driven Development as a methodology.

With a testing framework, you won't waste time rolling your own. For Java, you could start with JUnit.

TDD is a formalized approach to writing test functions as you work, which you already do, while simultaneously designing your code. There are many sources on it around the internet, but Kent Beck's book took the magic out of it (a good thing) for me: http://www.amazon.com/Test-Driven-Development-Kent-Beck/dp/0...

Of course, there are many other specific techniques you can use to test that your algorithms do the right thing, all dependent on what you've made. Regardless, I try to make sure that the a real user gets their hands on an up to date build as often as possible, and they're sure to show me all of the ways that my software is technically excellent, yet in no way solves their actual problem.

steveklabnik · on Aug 30, 2010

Also, please check out 'Behavior Driven Development's or BDD, which is basically "TDD done correctly." It alters the description of the process a bit, and in the end, you write better tests.

Since I'm a Rubyist, I use Cucumber and RSpec to do testing. http://cukes.info http://rspec.info (note that you can use cucumber to test any language, and I bet you could use JRuby to test java.

kanak · on Aug 30, 2010

Could you recommend some resources for learning BDD?

steveklabnik · on Aug 30, 2010

Sure. Now that I'm not on my phone, it's much easier to grab some links.

Here's the original work on the subject, by Dan North: http://blog.dannorth.net/introducing-bdd/

A little Rails specific, but this post by Sarah Mei is pretty awesome: http://www.sarahmei.com/blog/2010/05/29/outside-in-bdd/

These two Railscasts on Cucumber are good: http://railscasts.com/episodes/155-beginning-with-cucumber http://railscasts.com/episodes/159-more-on-cucumber

But really, conceptually, BDD isn't that complicated. The hardest bit is figuring out the tooling for whatever language you're using, getting comfortable with it, and practicing. I know the Ruby side of this well, but if you're using another language, I'm not sure I can be of much help.

uxp · on Aug 30, 2010

Adding on to your comment, a friend let me read through a beta copy of The Rspec Book last year. At the time, only the first couple chapters were written and it was lacking a lot of content. It was a great resource though, and I would highly recommend it. The final release is due "Soon", and ordering it now gives you access to the current revision of the Beta, which I hear is complete sans-editing. I'm holding out until I can buy a paper copy.

http://www.pragprog.com/titles/achbd/the-rspec-book

Your links to the RailsCasts screencasts are great also. PeepCode also has their Cucumber Screencast that is a little more detailed than RailsCasts.

http://peepcode.com/products/cucumber

These are all Ruby or Rails specific. Cucumber itself is a DSL to be used along with another language, and there are projects underway to implement Cucumber or Cucumber-like frameworks for other languages like Python, Java and .NET

steveklabnik · on Aug 30, 2010

Thanks. I almost linked to the RSpec Book, but since I haven't read it myself, I didn't feel comfortable endorsing it. Good to know it's shaping up! Maybe I'll have to pick up a copy.

PKeeble · on Aug 30, 2010

Personally I use a triplet of tests.

1) The first test I right is a functional end to end test that assumes the application is deployed and running with all configuration in place. That means there is a database and all the other parts necessary to run the application. The test follows BDD format (Given ... when ... then). Its purpose is to cross the bridge between high level functional requirement (I want X) to an executable spec for the aspects of that.

2) I then develop a series of unit tests. These don't require a database or even the file system, they are purely in the language of the application. I use a mocking framework to isolate to units and TDD out all the aspects.

3) Finally I write a performance test. A preloaded set of known data is inserted into an environment looks very similar to production. Any additional specialist data for the individual test is then loaded on top and the test is run and asserted against an expected maximum time.

The combination of the three is working OK for me but there are still gaps. Its hard to get the automated performance testing right as there are so many types of tests you actually want to run which are very hard to automatically verify.

That is how I do it IRL.

aufreak3 · on Aug 30, 2010

TDD folks will ask you to write tests first, then code. At least, for the example you used to make your question concrete, I'd first collect data I can use for testing before writing test code. You can even ask around for data.

Doing this helps me a lot 'cos by the time I have the data collected, I usually have a rather good idea of what code to write and edge cases even before writing a slew of tests for it.

If you're writing a data structure + algos for it, then think of a data format for storing and loading that structure so you can do the data collection in that format first. It'll also help write tests greatly.

Scripting languages are great ways to test your code, even if it is in C/C++. Learning to bind native code to some scripting language such as python helps a ton.

wilhelm · on Aug 30, 2010

For web applications? Watir. Unit tests are great and all, but not that useful compared to high-level functional tests that tests the stuff users are actually interested in doing. With limited time and limited resources, write Watir tests first.

Test #1: Sign up. Did it work? Were all parameters set correctly? Test #2: Log in. Did it work? Are you logged in as the correct user? Test #3: Something that touches your most important business logic. Can the logged in user add an item to her shopping cart? Post in her blog?

Just a hundred tests like that gets you a long way, and doesn't take that long to write.

brown · on Aug 30, 2010

Since you mentioned data structures... The nice thing about most data structures is that they are deterministic, and therefore there is (usually) a clear definition of "correct". Write your unit tests, make sure they work, and then make sure they work after all future changes.

For each function, you should have a handful of unit tests. Test the normal/expected values. Test edge cases (zero, infinity, negative infinity). Test possible error cases (null values, uninitialized data, etc). Basically, you want to test all the possible "buckets" of possible inputs.

"Correctness" is typically assumed. If your code isn't correct, then fix it.

"Robust" is the ideal. Most seasoned devs will have their toolbox of classes, snippets, and nuggets that they have incrementally tweaked and improved. Over time, they become ridiculously stable, full of years of bug fixes and all sorts of edge cases. They'll then use those snippets in project after project.

Testing becomes a lot more challenging when the function is less deterministic. If it's a heuristic based function, you will constantly be balancing the tradeoffs of accuracy vs. performance. For example, I used to work on a driving directions algorithm. Our team had a library of ~30k routes. After every code change, we would compare "accuracy" (average driving time) vs. "performance" (routes calculated per minute).

Testing becomes even more challenging when you're playing at the system level. Then you have load testing, stress testing, endurance testing, etc. That's when you need full time testers.

crcarlson · on Aug 30, 2010

In addition to essentially input-output test driven development at the unit, subsystem and system levels, I am a huge fan of "design by contract" for identifying both design and implementation errors. In my personal experience no test suite has been fully comprehensive and rigorously placed assertions, invariants, etc. pick up all kinds of design and implementation flaws. I find they also help for debugging by taking the guesswork out of many intermediate possibilities.

emmett · on Aug 30, 2010

The only real answer is "it depends" for the correct way to do testing. You're never going to get out of thinking it through, there's no substitute for human judgement. If you care a lot about the memory profile of your data structure, you need to test that really carefully. But you might not care very much about that, in which case it would be a waste of time.

However, I think I can lay your fears to rest about "running a bunch of functions" - in the end all testing frameworks come down to running a bunch of functions. All that you get out of a framework is a (sometimes) shorter way to write those functions in a domain specific way. In the case of a data structure, a test framework doesn't buy you very much because the domain of the problem is basically "a computer program", which computer programming languages are pretty good at making statements about already.

So in short, it sounds like you're already doing the right thing.

mhansen · on Aug 30, 2010

I'm interested in what you mean by the 'really' in the title. Do you think there's some kind of secret to how people test that they're not revealing?

If you want to see some 'IRL' test code, have a look at your favorite open source project. The test code will all be there, usually in a folder just off the root called 'tests/'.

pilom · on Aug 30, 2010

It very much depends on what you are doing. How many errors per KLOC can you deal with? A website isn't going to have the same requirements that NASA does. On the one end, try acceptance testing and user testing. Its cheap and fast. On the other end, provably bug free code is difficult (not quite impossible http://www.ece.cmu.edu/~koopman/ballista/index.html) but requires better documentation than anyone should ever write.

wheaties · on Aug 30, 2010

Have you tried looking or searching on www.StackOverflow.com ? There's a ton of information, often very specific to the types of technology you could be using.

known · on Aug 30, 2010

http://en.wikipedia.org/wiki/Test-driven_development

prototype56 · on Aug 30, 2010

TDD has nothing to do with testing.

kranner · on Aug 30, 2010

No no, let's keep it general, please.

dsc · on Aug 30, 2010

I know you! you're the guy that wants to take the good advice and use it elsewhere!

^5s

kranner · on Aug 30, 2010

OK, I'll bite. 'Code' for me, at the moment, is a web app in beta that needs functional and cross-browser UI testing.

So until I script all flows into Selenium, twill, windmill or something else, and until I can figure out how to automate screenshots of all screens in the OSs and browsers I care about (http://stackvm.com ?), I have checklists, and I go through them manually.

It helps that my web app's UI is short and sweet, but that's probably because I'm trying to avoid thinking about the combinatorial explosion it's hiding.

epall · on Aug 30, 2010

check out http://saucelabs.com/! Write your scripts with Selenium RC, run them against any browser, and get screenshots of every important steup.

kranner · on Aug 30, 2010

Thanks, I'll have a look.

steveklabnik · on Aug 30, 2010

Check out http://browsershots.org/

kranner · on Aug 30, 2010

Thanks, I've used that before but I meant with cookies, etc.

mkramlich · on Aug 30, 2010

Rough rule of thumb that works wonders:

Write/modify code.

Run program and/or ensure that codepath is executed.

Did it do what you intended?

If yes, figure out what's wrong about it, fix it, retry.

If no, move on to next item on your agenda. (Possibly first doing a quick refactor to improve readability, etc.)

Tyrannosaurs · on Aug 30, 2010

If you refactor you should always retest. I've lost track of the number of times things have been broken by someone tweaking things to make them better.

silentbicycle · on Aug 30, 2010

The flip side of this is that you can do deep refactoring with more confidence if you have thorough test coverage (and/or a smart type system).