Measuring Effectiveness: The Software Industry’s Conundrum

On “Measuring Effectiveness”

I’ve been saying for seemingly decades, this is (one of?) our nascent software industry’s biggest conundrums (maybe even an enigma?).

Just how do you prove a team is “doing better?” Or that a new set of processes are “more effective?”

My usual reaction when asked for such proof:

“Ok, show me what you are tracking today for the team’s performance over the past few years, so we have a baseline.”


But seriously, wouldn’t it be awesome if we could measure something? Anything?

For my money:

Delivering useful features on top of a reasonably quality architecture, with good acceptance and unit test coverage, for a good price, is success.

Every real engineering discipline has metrics (I think, anyway, it sounds good — my kids would say I am “Insta-facting™”).

If we were painting office building interiors, or paving a highway, we could certainly develop a new process or a new tool and quantitatively estimate the expected ROI, and then prove the actual ROI after the fact. All day long.

In engineering a new piece of hardware, we could use costing analysis, and MTBF to get an idea on the relative merits of one design over another.

We would even get a weird side benefit — being relatively capable at providing estimates.

In software, I posit this dilemma (it’s a story of two teams/processes):

Garden A:

  • Produces 15 bushels (on average) per month over the growing season
  • Is full of weeds
  • Does not have good soil management
  • Will experience exponential (or maybe not quite that dramatic) production drop off in ensuing years, requiring greater effort to keep the production going. Predictability will wane.
  • Costs $X per month to tend

Garden B:

  • Produces 15 bushels (on average) per month over the growing season
  • Is weed free and looks like a postcard
  • Uses raised bed techniques, compost, and has good soil management
  • Will experience consistent, predictable, production in ensuing years
  • Costs $Y per month to tend

I could make some assertions that $Y is a bit more costly than $X… Or not. Let’s assume more costly is the case for now.

To make it easier to grok, I am holding the output of the gardens constant. This is reflected by the exorbitant rise in cost in the weedy Garden A to keep producing the same bushels per month… (I could have held the team or expense constant, and allowed production to vary. Or, I could have tried to make it even more convoluted and let everything vary. Meh. Deal with this simple analogy!)


Year 1 2 3 4 5 6 7 8 9 10
Garden A 100 102 105 110 130 170 250 410 730 1600
Garden B 120 120 120 120 120 120 120 120 120 120

If we look at $X and $Y in years 1 through 10, we might see some numbers that would make us choose B over A.

But if we looked at just the current burn rate (or even through year 4), we might think Garden A is the one we want. (And we can hold our tongue about the weeds.)

But most of the people asking these questions are at year 5-10 of Garden A, looking over at their neighbor’s Garden B and wanting a magic wand. The developers are in the same boat… Wishing they could be working on the cooler, younger, plot of Garden B.

What’s a business person/gold owner to do? After all, they can’t really even see the quality of the garden, they just see output. And cost. Over time. Unless they get their bonus and move on to the next project before anyone finds out the mess in Garden A. Of course, the new person coming into Garden A knows no different (unless they were fools and used to work in Garden B, and got snookered into changing jobs).

Scenario #2

Maybe we abandon Garden A, and start anew in a different plot of land every few years? Then it is cheaper over the long haul.

Year 1 2 3 4 5 6 7 8 9 10
Garden A 100 102 105 100 102 105 100 102 105 100
Garden B 120 120 120 120 120 120 120 120 120 120

I think the reason it is so challenging to get all scientific about TQM, is that what we do is more along the lines of knowledge work and craftwork, compared to assembly line.

The missing piece is to quantify what we produce in software. Just how many features are in a bushel?

I submit: ask the customer what to measure. And maybe the best you can do is periodic surveys that measure satisfaction (sometimes known as revenue).

Matt Snyder (@msnyder) tweeted me a nice video: Metrics, Metrics, Everywhere – Coda Hale