Other than unit testing, what tools and methods do you use for software testing? Do you even do any other type of testing? Do you use commercial tools?
We have Hudson running continuous tests on svn checkins. Which is unit tests and some other tests run through junit and junitperf etc, sort of functional testing.
We then also do full build performance tests using the grinder at night. This is currently not very automated at all, and involves doing processing in the morning. These are short runs, multiple samples averaged. It would be good to know how others are running there performance testing automatically, and also how they are monitoring the systems during the exact times of the test.
We also prior to release run some serious stress testing, again using grinder. Then soak tests using grinder at expected load for 1 week. This gives us the data we need for a release.
I'm doing performance testing through ant nightly, just running a custom package to exercise the server and grabbing metrics through a diagnostic call and outputting periodically.
Do you mean metrics from your program under test, the program testing it or the system?
We have the test metrics from the tester, the logs from the software under test, but it's working out how to properly monitor the machines and tie them to specific test runs we haven't managed yet. The tests start up from around 8 at night on a seperate 10 machine subnet. We are looking in to setting up munin in a controller-slave setup on one machine to take aggregate data from the other 9 test machines and somehow tie it to a specific test run and provide us with report graphs using both the throughput data from the grinder/jmeter. This is the bit I can't seem to find any information about doing. I've seen a lot of documentation and tech talks telling how you must monitor the system as well as the software, and they show throughput against memory usage and throughput against cpu usage, file IO etc. We do a lot of this stuff manually, after we see a spike in system usage stats during the night we will go digging in logs comparing time stamps finding out what was running and what went wrong. It's painful and it's slow.
It would be great to read how some other people solved this problem.
Most metrics are coming from a diagnostic call on the server being tested (memory use etc) with throughput data (request per second etc) coming from the program testing it. There is only limited data regarding the system collected - memory use of the JVM primarily. Each test instance starts fresh on 2 ec2 instances, one for the performance test package which replays abstracted logs and one for the server being tested. Each test instance saves the data for its test run, which I grab before shutting down the instance. On the assumption that each instance is clean at start of test, I have not considered much how performance of the system itself impacts the tests - but perhaps I should revisit this.
Most of our stuff is custom code for a number of reasons, the biggest external library being HttpClient. I'm not sure how other people have solved this particular problem, I just got to the point that clean EC2 instances were generating pretty comparable data and then set to putting out fires in the code (:
Rspec is my current favorite for testing. I still do unit tests, but rspec is so semantically intuitive and helpful when running the suites against large apps.
+1 for rspec. I use it in a Rails project and it integrates seamlessly. I've been doing unit testing with all sorts of xUnit frameworks on and off for several years, and I've never found anything as enjoyable to use as rspec.
I rely on Python's built-in unittest module. While this is most seamless for Python code, with some customization you can make it manage anything (e.g. running external test programs and writing assertions on the results).
JUnit, FlexUnit and nightly build scripts with ant. Everything is complicated by working on an OSGi based project, so eclipse test framework is a core piece and reports from ETF are formatted with an XSLT and mailed.
I use the standard Ruby Test::Unit framework, with ThoughtBot's Shoulda extensions. I enjoy the standardization of the xUnit family, but the RSpec-y contexts and definitions are great too! best of both worlds!
We write a lot of what we call "smoke tests," which are essentially integration testing in most people's parlance. It's a long story, but we have a custom web framework that's widget-based, and one thing that allows us to do is to write UI-level tests directly in-process against that widget model, rather than having to use an external client to deal with the HTML output (so they can run and be written like unit tests). Our internal language has an open type system, allowing us to essentially do type-safe metaprogramming on the set of pages and widgets, such that if someone changes the page (like removing a button) the test will cease to compile. In general, maintenance is a huge problem for UI-level tests, so that's been a huge win for us.
To test the framework itself, we mainly use Selenium to test the actual HTML and javascript output. To test things that Selenium can't easily handle, we've also used the Watir library (http://wtr.rubyforge.org/).
Our real unit tests are basically built on top of JUnit, though we have our own extensions to it that are specific to our platform.
To run the tests and report results, we have a custom application developed on our web framework and the rest of our software stack. Originally we used CruiseControl, but we outgrew that fairly quickly. It does a lot of stuff for us, but the key abilities are 1) parceling the tests out to a farm of servers and 2) assigning test breaks to users (basically a guess based on who changed what files in source control since the last successful test run); if you have more than a few people on the team, without actually making the test breaks one person's problem to resolve you end up with massive diffusion of responsibility where no one thinks it's their break.
We also have an internal performance testing harness for running load tests, which we write using the same framework as the smoke tests I described above (though in that case the clients are naturally remote).
Like anything else, depends on how the test code was written to begin with! :-) SilkTest supports class inheritance, so if you have a good set of core classes, then, when application functionality changes, you can just change your inherited class implementations which makes it easier than having to re-invent the wheel every time.
We then also do full build performance tests using the grinder at night. This is currently not very automated at all, and involves doing processing in the morning. These are short runs, multiple samples averaged. It would be good to know how others are running there performance testing automatically, and also how they are monitoring the systems during the exact times of the test.
We also prior to release run some serious stress testing, again using grinder. Then soak tests using grinder at expected load for 1 week. This gives us the data we need for a release.