Run automated tests as often as possible

I’ve been working with a large scale legacy system (in the Michael Feathers’ sense). Strangely enough there are a small handful of tests, unfortunately most of them falling by the way side having not been added into an Continuous Integration process. Many of them no longer look relevant, others, unable to run without a specific person’s machine (due to configuration files or special databases). Others look like a mechanism for people to simply diagnose issues with plenty of tracing statements. All of this was an inevitable system without someone running all the tests all the time.

Any serious team serious about continuous integration will think hard about their development testing strategy, considering approaches that allow you to run tests all the time. The strategy needs to consider a good way of preventing side effects from other tests (global state, global configuration, or left over state in external resources) balancing out speed with good quality feedback. Staggering build processes into smaller chunks, such as using build pipelines are a great way of achieving this.

Please don’t let leave your tests to die a sad and lonely death. It will take discipline and effort to keep them running yet the results will often pay back for themselves very quickly on any system with a reasonable lifespan.

Working in Scenarios

I’ve been busy retrofitting tests to a legacy system. We think our current best bang-for-buck are integration tests, at a service layer since there is plenty of state, lots of valueobjects and most of the interesting business behaviour cross cuts several services. We’ve been trying to build up a suite of test assitants in the form of Builder/Factories yet most unit tests don’t offer much value because the way the services interact with each other (a code smell yes, but will remain for a while given the sheer amount of it).

Instead, we’ve been wrapping service methods in a more convenient wrapper class to take the noise out of the extremely chatty APIs, wrapping away logic in something that we will call xxxServiceScenario. Just as Mark has been playing around with GetBuilderFor on his projects, we’ve hung out all of our specific service wrappers off a class called Scenario.

We then pass builders into scenarios to create the different test state that we need before calling a particular service to test its behaviour. The benefits of this approach include that it’s a bit clearer to users what we’re testing (anything associated with a Scenario is part of a GIVEN block), and we can choose to change the implementation (we’ve moved from using SQL to delegating to a service once we had more confidence in those services working properly) and our test methods are shorter without resorting to setup methods.

An example test might look like:

[Test]
public void ShouldDoSomethingInterestingGivenAParticularScenario()
{
   User firstUser = Scenario.Users().CreateMeARealUser(new UserBuilder().Build());
   User secondUser = Scenario.Users().CreateMeARealUser(new UserBuilder().Build());
   Scenario.Settings().EstablishDefaults(firstUser);

   Event onLineEvent = EventBuilder.For(firstUser).ViewedFrom(secondUser).Build()
   new InterestingService().ReactToInterestingEvent(onLineEvent);

   // some assertions here.
   ...
}

Some developers seem to be liking this concept, so thought I’d share it to learn what we’ve been doing.

Improving Collaboration Between Developers and Testers

One of the biggest divides you need to cross, even on agile teams, is the chasm between testers and developers. By the nature of their different roles there will always be tension with developers focusing on creation, change, and construction, and testers focusing on breaking, destructive, and exposing questionable system behaviours. It’s essential that both organisations and, in particular, individuals realise what they can do to ensure this tension doesn’t dissolve into an exclusively confrontational relationship.

catdogfight

Photo taken from Sephiroty’s Flickr stream under the Creative Commons licence

Recently, a new QA person highlighted this very issue for me. I’d finished some functionality with my pair and the QA had been testing it after we declared it ready to test. They came along to my desk and said, “I’ve found some defects with your code.” Internally I winced as I noticed myself wanting to say, “There’s nothing wrong with my code.” It got me thinking about why.

Evaluating people puts them on the defensive
Just as giving effective feedback should focus on behaviour, poor feedback makes a person feel criticised. When the QA said that there are some “defects”, it implied a broken state that you had to fix it. Made worse is the way that they said it made it feel like they were blaming you, and it’s very hard not to feel defensive in a situation like this. A very natural outcome is to pretty much deny the “evaluation” which I’m sure anyone in our industry would have witnessed at least once.

Avoid terms like “defect”, “broken”, “bugs”
One of the biggest differences working with agile testers versus testers who come from traditional backgrounds is the terms that they use. Traditional testers constantly use the words above. Agile testers focus on discussing the behaviour of the system and what expected behaviour they would see. They only use the words above once they have agreement on both of the both the current and expected behaviours. I definitely recommend you do not start a conversation with the words above as they all carry some connotation of “evaluation” and I’m yet to meet someone who truly feels comfortable being “evaluated”

Focus on Effective Feedback
Effective feedback follows a neat and simple pattern:

  1. What behaviour did you see?
  2. What impact did it have?
  3. What do we want to change?

Testers should use a similar series of questions (in order):

  1. What behaviour did you see?
  2. What behaviours did you expect to see?
  3. What are the consequences of the current system behaviour?
  4. Is that desired or undesired?
  5. Do we need to change it?

Apply the guideline above and watch the collaboration improve!

Controlling Time: How to deal with periodic gaps

Some application logging is important enough to put regressions tests in place. For instance, when you shell out to other programs and you want to monitor the amount of time it takes. In this entry I’m going to demonstrate how you might test that you are correctly logging out the time taken to do some long running work. I’m going to assume that you already have a Clock concept in the system.

Let’s assume we’re trying to test the following block of code:

public void doSomeWork(Clock clock) {
  DateTime startTime = clock.now();
  executeSomeLongRunningWork();
  DateTime endTime = clock.now();
  long timeTaken = endTime.minus(startTime.getMillis()).getMillis();
  LOG.log(Level.INFO, "Time taken was %sms", timeTaken);
}

One approach would be refactor the above code, replacing the Clock with a StopWatch but I’m going to show a technique for testing this code without changing it. Instead, we are going to inject an AutomaticallyForwardingClock. Here’s the implementation:

public class AutomaticallyMovingForwardClock implements Clock {
  private final int periodToMoveForwardsInMs;
  private DateTime now;

  public AutomaticallyMovingForwardClock(DateTime initialStartTime,
      int periodToMoveForwardsInMs) {
    this.now = initialStartTime;
    this.periodToMoveForwardsInMs = periodToMoveForwardsInMs;
  }

  public DateTime now() {
    DateTime result = now;
    now = now.plusMillis(periodToMoveForwardsInMs);
    return result;
  }
}

So whenever anyone asks the current time from the Clock, it automatically shifts time forward by the specified amount. Now we can write a test that looks like this:

@Test
public void shouldLogTimeTakenForProgramToExecute() throws Exception {
  int periodToMoveForwards = 5;
  Clock clock = new AutomaticallyMovingForwardClock(new DateTime(), periodToMoveForwards);
  doSomeWork(clock);

  long expectedDuration = periodToMoveForwards;
  assertThat(fetchLogEntry(), containsString("Time taken " + expectedDuration + "ms"));
}

A few notes from above:

  • The test uses the Joda DateTime class.
  • The test uses the Hamcrest API
  • The test assumes you know how to retrieve the log message from a logger

Although I think that injecting a StopWatch is probably a better mechanism, I hope that this technique still proves useful for some people.

Controlling Time: Threaded Execution

Update (Feb 21): Thanks to some feedback from Taras and Lowell, the ThreadedRunner is the same as the java.util.concurrent.Executor class! I’ve updated the post to reflect this below. Thanks for the suggestion!

Unit testing threaded software can sometimes prove slightly cumbersome. Many of the examples on the net often show examples that tie a number of responsibilities together. It might be easy to see something like this:

  public class SomeExample {
    public void doSomeWork() {
      Runnable runnable = new Runnable() {
        public void run() {
          // do some long running work
        }
      };
      new Thread(runnable).start();
    }
  }

The thing about this class above is to realise what responsibilities it is mixing, including the new work they would like to (the item in Runnable), as well as how they would like to run it (the separate thread). It’s difficult to tease these two responsibilities apart the way it’s declared. The first thing I’d do is remove the dependency on the new thread. I’ll start by making use of the java.util.concurrent.Executor :

public interface Executor {
  void execute(Runnable runnable);
}

public class SingleThreadedExecutor implements Executor {
  public void execute(Runnable runnable) {
    runnable.run();
  }
}
public class NewThreadedExecutor implements Executor {
  public void execute(Runnable runnable) {
    new Thread(runnable).start();
  }
}

We’ll inject the SingleThreadedExecutor in our unit tests, so we only ever have one thread of execution, and we can rely on tests being consistently reproducible. We’ll inject the NewThreadedExecutor when we need to assemble our application for production. Here’s the step by step refactoring.

Wrap inside our new implementation

public class SomeExample {
  public void doSomeWork() {
    Runnable runnable = new Runnable() {
      public void run() {
        // do some long running work
      }
    };
    new NewThreadedExecutor().start(runnable);
  }
}

Inject instance via the constructor

public class SomeExample {
  private final NewThreadedExecutor executor;
  public SomeExample(NewThreadedExecutor executor) {
    this.executor = executor;
  }  

  public void doSomeWork() {
    Runnable runnable = new Runnable() {
      public void run() {
        // do some long running work
      }
    };
    executor.start(runnable);
  }
}

Replace specific with interface

public class SomeExample {
  private final Executor executor;
  public SomeExample(Executor executor) {
    this.executor = executor;
  }  

  public void doSomeWork() {
    Runnable runnable = new Runnable() {
      public void run() {
        // do some long running work
      }
    };
    executor.start(runnable);
  }
}

In the tests, you can now inject an instance of SingleThreadedExecutor, whilst in production mode, you use the NewThreadedExecutor.

Disclaimers
Of course, this still leaves your testing strategy incomplete where you do want to test your software with the number of threads that you do expect, particularly when you need to share state across different processes.

Controlling Time: How to deal with infinity

It’s been a while since I’ve had to slice and dice legacy code. It reminds me how easily, non test driven code, slips into the abyss of tight coupling and the effort of retrofitting (unit) tests increasing in effort with time. Of course, I don’t believe TDD should be done all the time, but for most production code I do. In this entry, I’m going to use the Sprout Inner Class mechanism to demonstrate how to start making something that would have been untestable much more manageable.

The scenario
What we have below is the entry point into our program (i.e. a class with a main method). When I first encountered this class, it had too many responsibilities including, and not limited to, parsing command line options, some application logic, handling terminal signals, and wiring up all the objects for the program to start. Combined with an infinite working loop, you can imagine what a nightmare it was to unit test. Here’s the modified class, called RunningProgram representing the important parts of the class related to this walk through.

Our task: To reduce the number of responsibilities and test the body of the start method (we couldn’t simply extract method immediately due to the number of fields it modified)

package com.thekua.examples;

import sun.misc.Signal;
import sun.misc.SignalHandler;

public class RunningProgram implements SignalHandler {
	private boolean running = true;
	// some other fields

	public void start() {
		running = true;
		while (running) {
			// do some work (that we want to unit test)
			// it changes about ten fields depending on what condition
			// gets executed
		}
		// Finished doing some work
	}

	// it also had plenty of other methods

	public static void main(String [] args) {
		RunningProgram program = new RunningProgram();
		Signal.handle(new Signal("TERM"), program);
		Signal.handle(new Signal("INT"), program);
		program.start();
	}

	public void handle(Signal signal) {
	    // Program terminated by a signal
		running = false;
	}
}

Step 1: Understand the dependencies that make this test difficult to test
To get some level of functional testing around this program we had a number of timed triggers, watching for state modifications with a given timeout. Our problem was infinity itself, or rather, not sure when to interrupt infinity to watch for when the body of work inside the loop had finished its work once, twice, and not too early. We could have made it more complicated by introducing lifecycle listeners yet we hesitated at that option because we thought it would complicate the code too much.

We noticed the use of the running flag. We noticed it was the condition for whether or not we continued looping, and was also the trigger for a graceful shutdown using the sun.misc.Signal class. Notice that the running program implements the SignalHandler interface as a result. We thought that running behaviour of the RunningProgram could be extracted into a separate aspect.

Step 2: Encapsulate, encapsulate, encapsulate
Our first task, was to remove direct access to the running flag since the class modified it in two places. Sprout an inner class, and simply delegate to getters and setters and we might find we have a class that looks like:

package com.thekua.examples;

import sun.misc.Signal;
import sun.misc.SignalHandler;

public class RunningProgram implements SignalHandler {
    public static class RunningCondition {
	    private boolean running = true;

		boolean shouldContinue() {
		    return running;
		}

		public void stop() {
		    running = false;
		}
    }

	private RunningCondition condition = new RunningCondition();

	// some other fields

	public void start() {
		while (condition.shouldContinue()) {
			// do some work (that we want to unit test)
			// it changes about ten fields depending on what condition
			// gets executed
		}
		// Finished doing some work
	}

	// it also had plenty of other methods

	public static void main(String [] args) {
		RunningProgram program = new RunningProgram();
		Signal.handle(new Signal("TERM"), program);
		Signal.handle(new Signal("INT"), program);
		program.start();
	}

	public void handle(Signal signal) {
	    // Program terminated by a signal
		condition.stop();
	}
}

Moving the running flag to a separate class gives us a number of benefits. It lets us hide the implementation of how we handle running, and puts us down the road of clearly teasing apart the overloaded responsibilities.

Step 3: Consolidate related behaviour
It bothered me that the main program had the shutdown hook. That behaviour definitely felt strongly related to the RunningCondition. I felt it was a good thing to move it to the that class. We now have something that looks like:

package com.thekua.examples;

import sun.misc.Signal;
import sun.misc.SignalHandler;

public class RunningProgram {
    public static class RunningCondition implements SignalHandler {
	    private boolean running = true;

		boolean shouldContinue() {
		    return running;
		}

		public void handle(Signal signal) {
	        // Program terminated by a signal
			running = false;
	    }
    }

	private RunningCondition condition = new RunningCondition();

	// some other fields

	public void start() {
		while (condition.shouldContinue()) {
			// do some work (that we want to unit test)
			// it changes about ten fields depending on what condition
			// gets executed
		}
		// Finished doing some work
	}

	// it also had plenty of other methods

	public static void main(String [] args) {
		RunningProgram program = new RunningProgram();
		Signal.handle(new Signal("TERM"), program.condition);
		Signal.handle(new Signal("INT"), program.condition);
		program.start();
	}

}

Note that it is now the RunningCondition that now implements the SignalHandler interface (that we are using to register with the Signal

Step 3: Remove dependency chain
The difficulty with this class still exists. We cannot modify the RunningCondition of this program since it creates one for itself. Since I prefer Constructor Based Dependency Injection, I’m going to apply Introduce Parameter to Constructor, moving the field declaration to the constructor itself. Here it is what the class looks like now:

package com.thekua.examples;

import sun.misc.Signal;
import sun.misc.SignalHandler;

public class RunningProgram {
    public static class RunningCondition implements SignalHandler {
	    private boolean running = true;

		boolean shouldContinue() {
		    return running;
		}

		public void handle(Signal signal) {
	        // Program terminated by a signal
			running = false;
	    }
    }

	private final RunningCondition condition;

	public RunningProgram(RunningCondition condition) {
	    this.condition = condition;
	}

	// some other fields

	public void start() {
		while (condition.shouldContinue()) {
			// do some work (that we want to unit test)
			// it changes about ten fields depending on what condition
			// gets executed
		}
		// Finished doing some work
	}

	// it also had plenty of other methods

	public static void main(String [] args) {
		RunningProgram program = new RunningProgram(new RunningCondition());
		Signal.handle(new Signal("TERM"), program.condition);
		Signal.handle(new Signal("INT"), program.condition);
		program.start();
	}

}

Note that we are still bound to the implementation of the specific RunningCondition, so it’s time to apply Extract Interface, and understand what role that RunningCondition has. We chose the name, RunStrategy for the role name, and to help keep the names more aligned, we ended up renaming RunningCondition to RunLikeADaemonStrategy. Our code now looks like this:

package com.thekua.examples;

import sun.misc.Signal;
import sun.misc.SignalHandler;

public class RunningProgram {
    public static interface RunStrategy {
	    boolean shouldContinue();
	}

    public static class RunLikeADaemonStrategy implements SignalHandler, RunStrategy {
	    private boolean running = true;

		public boolean shouldContinue() {
		    return running;
		}

		public void handle(Signal signal) {
	        // Program terminated by a signal
		    running = false;
	    }
    }

	private final RunStrategy runStrategy;

	public RunningProgram(RunStrategy runStrategy) {
	    this.runStrategy = runStrategy;
	}

	// some other fields

	public void start() {
		while (runStrategy.shouldContinue()) {
			// do some work (that we want to unit test)
			// it changes about ten fields depending on what condition
			// gets executed
		}
		// Finished doing some work
	}

	// it also had plenty of other methods

	public static void main(String [] args) {
		RunLikeADaemonStrategy strategy = new RunLikeADaemonStrategy();
		RunningProgram program = new RunningProgram(strategy);
		Signal.handle(new Signal("TERM"), strategy);
		Signal.handle(new Signal("INT"), strategy);
		program.start();
	}
}

The best thing is that our RunningProgram no longer needs to know what happens if a terminate or interrupt signal is sent to the program. With simply a dependency on the RunStrategy we can know inject a fixed run strategy for tests that we ended up calling a RunNumberOfTimesStrategy. We also promoted the specific RunLikeADaemonStrategy to a full class (not an inner class).

Thanks for getting this far! Please leave a comment if you thought this was useful.

Controlling Time

Disclaimer: This technique does not work outside of programming. Do not try this on your neighbours, kids or pets…

What’s wrong with time dependent tests?
It’s easy to write tests that are far too flaky and intermittent. Worse yet, a quick fix often results in putting a pause to tests that make them drag out longer and longer. Alternatively, unneeded complexity is added to try to do smart things to poll for tests to time out.

What can we do about it?
I’m all about solutions, so I’m out about to outline the secret to controlling time (in at least unit tests). Here’s a situation you might recognise. Or not. Sorry to anyone vegetarian reading this entry.

We have a class called Beef that knows when it’s past its prime using the Joda Time libraries.

package com.thekua.examples;

import org.joda.time.DateTime;

public class Beef {
	private final DateTime expiryDate;

	public Beef(DateTime expiryDate) {
		this.expiryDate = expiryDate;
	}

	public boolean isPastItsPrime() {
		DateTime now = new DateTime(); // Notice this line?
		return now.isAfter(expiryDate);
	}
}

Surprise, surprise. We also have a unit test for it:

package com.thekua.examples;

import static org.junit.Assert.assertTrue;
import org.joda.time.DateTime;
import org.junit.Test;

public class BeefTest {
	@Test
	public void shouldBePastItsPrimeWhenExpiryDateIsPast() throws Exception {
		int timeToPassForExpiry = 100;

		Beef beef = new Beef(new DateTime().plus(timeToPassForExpiry));

		Thread.sleep(timeToPassForExpiry * 2); // Sleep? Bleh...

		assertTrue(beef.isPastItsPrime());
	}
}

Step 1: Contain time (in an object of course)
The first step is to contain all the use of time concepts behind an object. Don’t even try to call this class a TimeProvider. It’s a Clock okay? (I’m sure I used to call it that in the past as well, so don’t worry!). The responsibility of the Clock is to tell us the time. Here’s what it looks like:

package com.thekua.examples;

import org.joda.time.DateTime;

public interface Clock {
	DateTime now();
}

In order to support the system working as normally, we are going to introduce the SystemClock. I sometimes call this a RealClock. It looks a bit like this:

package com.thekua.examples;

import org.joda.time.DateTime;

public class SystemClock implements Clock {
	public DateTime now() {
		return new DateTime();
	}
}

We are now going to let our Beef now depend on our Clock concept. It should now look like this:

package com.thekua.examples;

import org.joda.time.DateTime;

public class Beef {
	private final DateTime expiryDate;
	private final Clock clock;

	public Beef(DateTime expiryDate, Clock clock) {
		this.expiryDate = expiryDate;
		this.clock = clock;
	}

	public boolean isPastItsPrime() {
		DateTime now = clock.now();
		return now.isAfter(expiryDate);
	}
}

If you wanted to, the step by step refactoring would look like:

  1. Replace new DateTime() with new SystemClock().now()
  2. Replace new instance with field
  3. Instantiate new field in constructor

We’d use the RealClock in both the code that creates the Beef as well as our test. Our test should look like…

package com.thekua.examples;

import static org.junit.Assert.assertTrue;
import org.junit.Test;

public class BeefTest {
	@Test
	public void shouldBePastItsPrimeWhenExpiryDateIsPast() throws Exception {
		int timeToPassForExpiry = 100;
		SystemClock clock = new SystemClock();

		Beef beef = new Beef(clock.now().plus(timeToPassForExpiry), clock);

		Thread.sleep(timeToPassForExpiry * 2);

		assertTrue(beef.isPastItsPrime());
	}
}

Step 2: Change the flow of time in tests
Now that we have the production code dependent on an abstract notion of time, and our test still working, we now want to substitute the RealClock with another object that allows us to shift time for tests. I’m going to call it the ControlledClock. Its responsibility is to control the flow of time. For the purposes of this example, we’re only going to allow time to flow forward (and ensure tests use relative times instead of absolute). You might vary it if you needed very precise dates and times. Note the new method forwardTimeInMillis.

package com.thekua.examples;

import org.joda.time.DateTime;

public class ControlledClock implements Clock {
	private DateTime now = new DateTime();

	public DateTime now() {
		return now;
	}	

	public void forwardTimeInMillis(long milliseconds) {
		now = now.plus(milliseconds);
	}
}

Now we can use this new concept in our tests, and replace the way that we previously forwarded time (with the Thread.sleep) with our new class. Here’s what our final test looks like now:

package com.thekua.examples;

import static org.junit.Assert.assertTrue;
import org.junit.Test;

public class BeefTest {
	@Test
	public void shouldBePastItsPrimeWhenExpiryDateIsPast() throws Exception {
		int timeToPassForExpiry = 100;
		ControlledClock clock = new ControlledClock();

		Beef beef = new Beef(clock.now().plus(timeToPassForExpiry), clock);

		clock.forwardTimeInMillis(timeToPassForExpiry * 2);

		assertTrue(beef.isPastItsPrime());
	}
}

We can even further improve this test to be more specific by forwarding time by simply adding one rather than multiplying twice.

Step 3: Save time (and get some real sleep)
Although this is a pretty trivial example of a single use of time dependent tests, it shouldn’t take too much effort to introduce this concept to any classes that depend on time. Not only will you save yourself heart-ache with either flaky, broken tests, but you should also save yourself the waiting time you’d otherwise need to introduce, leading to faster test execution, and that wonderful thing of fast feedback.

Enjoy! Let me know what you thought of this by leaving a comment.

The world mocks too much

One of my biggest annoyances when it comes to testing is when anyone reaches for a mock out of habit. The “purists” that I prefer to call “zealots”, are overwhelming in numbers, particularly in the TDD community (you do realise you can test drive your code without mocks?) Often I see teams use excuses like, “I only want to test a single responsibility at a time.” It sounds reasonable, yet it’s generally the start of something more insidious, one where suddenly mocks become the hammer and people want to apply it to everything. I sense it through signals like when people say “I need to mock a class”, or “I need to mock a call to a superclass”. Alternatively it’s quite obvious when the number of “stub” or “ignored” calls outnumber the “mocked” calls by an order of magnitude.

Please don’t confuse my stance with “classical state based testing purists” who don’t believe in mocking. Though Martin Fowler describes two, apparently opposing, styles to testing, Classical and Mockist Testing, I don’t see them as mutually exclusive options, rather two ends of a sliding scale. I typical gauge risk as a factor for determining which approach to take. I believe in using the right tool for the right job (easy to say, harder to do right), and believe in using mocking to give me the best balance of feedback when things break, enough confidence to refactor with safety, and as a tool for driving out a better design.

Broken Glass

Image of broken glass taken from Bern@t’s flickr stream under the creative commons licence

Even though I’ve never worked with Steve or Nat, the maintainers of JMock, I believe my views align quite strongly with theirs. When I used to be on the JMock mailing list, it fascinated me to see how many of their responses focused on better programming techniques rather than caving into demands for new features. JMock is highly opinionated software and I agree with them that you don’t want to make some things too easy, particularly those tasks that lend themselves to poor design.

Tests that are difficult to write, maintain, or understand are often a huge symptom for code that is equally poorly designed. That’s why that even though Mockito is a welcome addition to the testing toolkit, I’m equally frightened by its ability to silence the screams of test code executing poorly designed production code. The dogma to test a single aspect of every class in pure isolation often leads to excessively brittle, noisy and hard to maintain test suites. Worse yet, because the interactions between objects have been effectively declared frozen by handfuls of micro-tests, any redesign incurs the additional effort of rewriting all the “mock” tests. Thank goodness sometimes we have acceptance tests. Since the first time something is written is often not the best possible design, writing excessively fine grained tests puts a much larger burden on developers who need to refine it in the future.

So what is a better approach?
Interaction testing is a great technique, with mocks a welcome addition to a developer’s toolkit. Unfortunately it’s difficult to master all aspects including the syntax of a mocking framework, listening to the tests, and responding to appropriately with refactoring or redesign. I reserve the use of mocks for three distinct purposes:

  1. Exploring potential interactions – I like to use mocks to help me understand what my consumers are trying to do, and what their ideal interactions are. I try to experiment with a couple of different approaches, different names, different signatures to understand what this object should be. Mocks don’t prevent me from doing this, though it’s my current preferred tool for this task.
  2. Isolating well known boundaries – If I need to depend on external systems, it’s best to isolate them from the rest of the application under a well defined contract. For some dependencies, this may take some time to develop, establish, stabilise and verify (for which I prefer to use classical state based testing). Once I am confident that this interface is unlikely to change, then I’m happy to move to interaction testing for these external systems.
  3. Testing the boundaries of a group of collaborators – It’s often a group of closely collaborating objects to provide any useful behaviour to a system. I err towards using classical testing to test the objects in that closely collaborating set, and defer to mocking at the boundary of these collaborators.

Automated Acceptance Tests: What are they good for?

A long time ago, I wrote about the differences between the types of tests I see, yet a lot of people don’t appreciate where acceptance tests fit in. I won’t be the first to admit that at first glance, automated acceptance tests seem to have lots of problems. Teams constantly complain about the build time, the maintenance burden they bring, and the difficulty of writing them in the first place. Developers specifically complain about duplicating their effort at different levels, needing to write assertions more than once (at a unit or integration level), and that they don’t get too much value from from them.

I wish everyone would realise that tests are an investment, a form of insurance against risk (the system not working), and most people don’t even know what risk level they are willing to take. That’s why, on experimental code (i.e. a spike), I don’t believe in doing test driven development. I’ve been fortunate enough, or is that learn some lessons from, seeing both extremes.

Maintaining a system only with acceptance tests (end to end)
I worked on one project where the architect banned unit tests, and we were only allowed integration and acceptance tests. He believed (rightly so) that acceptance tests let us change the design of the code without breaking functionality. His (implicit) choice of insurance was a long term one – ensure the state of the system constantly works over time, even with major architectural redesign. During my time on this project, we even redesigned some major functionality to make the system more understandable and maintainable. I doubt that without the acceptance tests, we would have had the confidence to move so quickly. The downside to this style of testing is that the code-test feedback was extremely slow. It was difficult to get any confidence that a small change was going to work. It felt so frustrating to move at such a slow pace without any faster levels of feedback.

Scenario driven acceptance tests (opposed to the less maintainable, story-based acceptance tests) also provide better communication for end users of the system. I’ve often used them as a tool for aiding communication with end users or customer stakeholders to get a better understanding about what it is they think the system should be doing. It’s rare that you achieve the same with unit or integration tests because they tell you more how a particular aspect is implemented, and rarely lacks the system context acceptance tests have.

Maintaining a system only with unit tests
On another project, I saw a heavy use of mocks, and unit tests. All the developers moved really fast, enjoyed refactoring their code, yet on this particular project, I saw more and more issues where basic problems meant that starting up the application failed because all those tiny, well refactored objects just didn’t play well together. Some integration tests caught some of these, but I felt like this project could have benefited from at least a small set of acceptance tests to prevent the tester continuously deploying a broken application despite a set of passing unit tests.

What is your risk profile when it comes to testing?
I think every team developer must understand that different types of test give us different levels of feedback (see the testing aspect to the V-Model), and each has a different level of cost determined by constraints of technology and tools. You’re completely wrong if you declare all automated acceptance tests bad, or all unit tests are awful. Instead you want to choose the right balance of tests (the insurance) that match system’s constraints for its entire lifetime. For some projects, it may make more sense to invest more time in acceptance tests because the risk of repeated mistakes is significantly costly. For others, the cost of manual testing mixed with the right set of unit and integration tests may make more sense. Appreciate the different levels of feedback tests bring, and understand the different levels of confidence over the lifetime of the system, not just the time you spend on the project.

Automated story-based acceptance tests lead to unmaintainable systems

Projects where the team directly translates story-level acceptance criteria into new automated test cases set themselves up for a maintenance nightmare. It seems like an old approach (I’m thinking WinRunner-like record-play back scripts), although at least the teams probably feel the pain faster. Unfortunately not many teams seem to know what to do. It sounds exactly like the scenarios that my colleagues, Phillip and Sarah are experiencing or experienced recently.

Diagnosing this style of testing is easy. If I see the story number or reference in the title of the test class or test case name, chances are, your team experiences automated story-based acceptance tests.

Unfortunately the downfall to this approach has more to do with the nature of stories than it does with the nature of acceptance tests (something I’ll get to later). As I like to say, stories represent the system in a certain snapshot of time. The same thing that lets us deliver incremental value in small chunks just doesn’t scale if you don’t consolidate the new behaviour of the system, with its already existing behaviour. For developers, the best analogy is like having two test classes for a particular class, one that reflected the behaviours and responsibilities of the system at the start, and one that represents the new behaviours and responsibilities of the system right now. You wouldn’t do this at a class level, so why should you do it at the system level?

Avoid temporal coupling in the design of your tests. The same poor design principle of relating chunks of code together simply because someone asked for them at the same time, also apply to how you manage your test cases. In terms of automated story-based acceptance tests, avoid spreading similar tests around the system just because they were needed at different times.

What is a better way? A suggested antidote…

On my current project, I have been lucky enough to apply these concepts early to our acceptance test suites. Our standard is to group tests, not based on time, but on similar sets of functionality. When picking up new stories, we see if any existing tests need to change, before adding new ones. The groupings in our system are based on the system level features, allowing us to reflect the current state of the system as succinctly as possible.

Next Page »