User Stories and FDD

FDD?

I bet you never heard of Feature-Driven Development, eh?

Well, Mike Cohn wrote this recent post:

Not Everything Needs to Be a User Story: Using FDD Features

Having worked with Peter Coad since the early 90s, and Jeff De Luca in the late 90s, I’ve been a fan of FDD and naturally turn to that style when “user stories” are not so user-centric. And yes, those are typically the minority items on our backlogs.

Software development is working through a prioritized to-do list. Most of the to-dos should be about addressing user needs. Call them user stories, call them features, maybe even call them requirements. Whatever works best to help you organize and communicate what needs to be built.

Another element of FDD is breaking down (or building up) the system into

  • Major Feature Sets (Quote Management), and their
    • Feature Sets (Clone Quotes, Create Quotation Documents), and their
      • Features (create a quote, edit a quote, archive a quote).

Major Feature Sets might loosely equate to epics 🙂

One of the keys to successful software development, is to combine the list of features with a domain model (and some UI mockups don’t hurt). The domain model need not be to the nth degree of UML detail. But one that clearly describes — in just enough detail — what your problem domain is all about. This eliminates the need to write all sorts of detail in the development issues, leaving that to the model. Then the feature list become more about the order in which we are building up various aspects of the product feature sets.

Thanks for paying a bit of homage to FDD. A blast from the past!

Measuring Effectiveness: The Software Industry’s Conundrum

On “Measuring Effectiveness”

I’ve been saying for seemingly decades, this is (one of?) our nascent software industry’s biggest conundrums (maybe even an enigma?).

Just how do you prove a team is “doing better?” Or that a new set of processes are “more effective?”

My usual reaction when asked for such proof:

“Ok, show me what you are tracking today for the team’s performance over the past few years, so we have a baseline.”

Crickets…

But seriously, wouldn’t it be awesome if we could measure something? Anything?

For my money:

Delivering useful features on top of a reasonably quality architecture, with good acceptance and unit test coverage, for a good price, is success.

Every real engineering discipline has metrics (I think, anyway, it sounds good — my kids would say I am “Insta-facting™”).

If we were painting office building interiors, or paving a highway, we could certainly develop a new process or a new tool and quantitatively estimate the expected ROI, and then prove the actual ROI after the fact. All day long.

In engineering a new piece of hardware, we could use costing analysis, and MTBF to get an idea on the relative merits of one design over another.

We would even get a weird side benefit — being relatively capable at providing estimates.

In software, I posit this dilemma (it’s a story of two teams/processes):

Garden A:

  • Produces 15 bushels (on average) per month over the growing season
  • Is full of weeds
  • Does not have good soil management
  • Will experience exponential (or maybe not quite that dramatic) production drop off in ensuing years, requiring greater effort to keep the production going. Predictability will wane.
  • Costs $X per month to tend

Garden B:

  • Produces 15 bushels (on average) per month over the growing season
  • Is weed free and looks like a postcard
  • Uses raised bed techniques, compost, and has good soil management
  • Will experience consistent, predictable, production in ensuing years
  • Costs $Y per month to tend

I could make some assertions that $Y is a bit more costly than $X… Or not. Let’s assume more costly is the case for now.

To make it easier to grok, I am holding the output of the gardens constant. This is reflected by the exorbitant rise in cost in the weedy Garden A to keep producing the same bushels per month… (I could have held the team or expense constant, and allowed production to vary. Or, I could have tried to make it even more convoluted and let everything vary. Meh. Deal with this simple analogy!)

 

Year 1 2 3 4 5 6 7 8 9 10
Garden A 100 102 105 110 130 170 250 410 730 1600
Garden B 120 120 120 120 120 120 120 120 120 120

If we look at $X and $Y in years 1 through 10, we might see some numbers that would make us choose B over A.

But if we looked at just the current burn rate (or even through year 4), we might think Garden A is the one we want. (And we can hold our tongue about the weeds.)

But most of the people asking these questions are at year 5-10 of Garden A, looking over at their neighbor’s Garden B and wanting a magic wand. The developers are in the same boat… Wishing they could be working on the cooler, younger, plot of Garden B.

What’s a business person/gold owner to do? After all, they can’t really even see the quality of the garden, they just see output. And cost. Over time. Unless they get their bonus and move on to the next project before anyone finds out the mess in Garden A. Of course, the new person coming into Garden A knows no different (unless they were fools and used to work in Garden B, and got snookered into changing jobs).

Scenario #2

Maybe we abandon Garden A, and start anew in a different plot of land every few years? Then it is cheaper over the long haul.

Year 1 2 3 4 5 6 7 8 9 10
Garden A 100 102 105 100 102 105 100 102 105 100
Garden B 120 120 120 120 120 120 120 120 120 120

I think the reason it is so challenging to get all scientific about TQM, is that what we do is more along the lines of knowledge work and craftwork, compared to assembly line.

The missing piece is to quantify what we produce in software. Just how many features are in a bushel?

I submit: ask the customer what to measure. And maybe the best you can do is periodic surveys that measure satisfaction (sometimes known as revenue).

Matt Snyder (@msnyder) tweeted me a nice video: Metrics, Metrics, Everywhere – Coda Hale


Easing New Developer Ramp-up Time

On a recent healthcare start-up team, it grew from my buddy (who moved on after 6 months or so) and I, to a handful of developers/sub-contractors.

Here is how we tried to make it fairly efficient. We just started tracking what was needed, getting feedback as each new developer went through the process, and improving the instructions and process along the way. If something was missing, we added it. If something was not clear, the new developer could amend the wiki.

By the 3rd new developer, or so, we had it down to where they could get started and begin a legitimate issue in less than a half day — from getting set up to being able to commit and deploy a new “feature.”

There was a section at the top that shared a good team chat room session with a new remote developer:

Getting Started Chat Room Conversation

That was followed by the FAQ-like list of links:

One of the first reasons I wanted to make it easy for a new team member to get rolling, was so that our friend Max — who would be doing our QA from Russia — could get started. As we added the first couple of devs, we probably decreased the start-up time as follows:

As part of “Getting Started,” I would include a simple Jira issue that helped them ensure that everything was working and that they followed our dev process:

  • Git and Dev and database (MongoDB) environment obviously had to be set up
  • Access to Jira to assign themselves to the issue, and move it — Kanban style — to the In Progress state.
  • Commit the work and the passing tests
  • Drag the Jira issue to “Done”

 

Since Atlassian’s Confluence Wiki does a stellar job at versioning pages, I actually looked back to see how the page grew and morphed over time. It started out rather modestly (and empty):

After it grew a bit bloated:

It was successively refactored into its current state, here is a snippet of the 70 versions that this page underwent from March 2011 through July 2012.

Wikis, like code, need to be tended to, nurtured, and refactored to provide the best value.

The Cost of Using Ruby’s Rescue as Logic

[notice]
If you use this sort of technique, you may want to read on.

node = nodes.first rescue return

[/notice]

 

[important]

Nov 2012 Update:

Though this post was about the performance cost of using a ‘rescue’ statement, there is a more insidious problem with the overall impact of such syntax. The pros and cons of using a rescue are well laid out in Avdi’s free RubyTapas: Inline Rescue

[/important]

Code like this:

unless nodes.nil?
  nodes.first
else
  return
end

Can be written using the seemingly more elegant approach with this ruby trick:

node = nodes.first rescue return

But then, that got me to thinking… In many languages I have used in the past (e.g., Java and C++), Exception handling is an expensive endeavor.

So, though the rescue solution works, I am thinking I should explore whether there are any pros/cons to allowing a “rescue” to act as logic. So I did just that…

Here are the two methods I benchmarked, one with “if” logic, and one with “rescue” logic:

def without_rescue(nodes)
  return nil if nodes.nil?
  node = nodes.first
end
def with_rescue(nodes)
  node = nodes.first rescue return
end

Using method_1, below, I got the following results looping 1 million times:

                  user     system      total        real
W/out rescue  0.520000   0.010000   0.530000 (  0.551359)
With rescue  22.490000   0.940000  23.430000 ( 26.487543)

Yikes. Obviously, rescue is an expensive choice by comparison!

But, if we look at just one or maybe 10 times, the difference is imperceptible.

Conclusion #1 (Normal Usage)

  • It doesn’t matter which method you chose to use if the logic is invoked infrequently.

Looking a bit Deeper

But being a curious engineer at heart, there’s more… The above results are based on worst-case, assuming nodes is always nil. If nodes is never nil, then the rescue block is never invoked. Yielding this (rather obvious) timing where the rescue technique (with less code) is faster:

                  user     system      total        real
W/out rescue  0.590000   0.000000   0.590000 (  0.601803)
With rescue   0.460000   0.000000   0.460000 (  0.461810)

However, what if nodes were only nil some percentage of the time? What does the shape of the performance curve look like? Linear? Exponential? Geometric progression? Well, it turns out that the response (see method_2, below) is linear (R2= 0.99668):

Rescue Logic is Expensive
Rescue Logic is Expensive

Conclusion #2 (Large Data Set):

In this example use of over a million tests, the decision on whether you should use “rescue” as logic boils down to this:

  • If the condition is truly rare (like a real exception), then you can use rescue.
  • If the condition is going to occur 5% or more, then do not use rescue technique!

In general, it would seem that there is considerable cost to using rescue as pseudo logic over large data sets. Caveat emptor!

Sample Code:

My benchmarking code looked like this:

require 'benchmark'

include Benchmark

def without_rescue(nodes)
  return nil if nodes.nil?
  node = nodes.first
end

def with_rescue(nodes)
  node = nodes.first rescue return
end

TEST_COUNT = 1000000

def method_1
  [nil, [1,2,3]].each do |nodes|
    puts "nodes = #{nodes.inspect}"
    GC.start
    bm(12) do |test|
      test.report("W/out rescue") do
        TEST_COUNT.times do |n|
          without_rescue(nodes)
        end
      end
      test.report("With rescue") do
        TEST_COUNT.times do |n|
          with_rescue(nodes)
        end
      end
    end
  end
end

def method_2
  GC.start
  bm(18) do |test|
    nil_nodes = nil
    real_nodes = nodes = [1,2,3]
    likely_pct = 0
    10.times do |p|
      likely_pct += 10
      test.report("#{likely_pct}% W/out rescue") do
        TEST_COUNT.times do |n|
          nodes = rand(100) > likely_pct ? real_nodes : nil_nodes
          without_rescue(nodes)
        end
      end
      test.report("#{likely_pct}% With rescue") do
        TEST_COUNT.times do |n|
          nodes = rand(100) > likely_pct ? real_nodes : nil_nodes
          with_rescue(nodes)
        end
      end
    end
  end
end

method_1
method_2

Sample Output

                  user     system      total        real
W/out rescue  0.520000   0.010000   0.530000 (  0.551359)
With rescue  22.490000   0.940000  23.430000 ( 26.487543)
nodes = [1, 2, 3]
                  user     system      total        real
W/out rescue  0.590000   0.000000   0.590000 (  0.601803)
With rescue   0.460000   0.000000   0.460000 (  0.461810)
                        user     system      total        real
10% W/out rescue    1.020000   0.000000   1.020000 (  1.087103)
10% With rescue     3.320000   0.120000   3.440000 (  3.825074)
20% W/out rescue    1.020000   0.000000   1.020000 (  1.036359)
20% With rescue     5.550000   0.200000   5.750000 (  6.158173)
30% W/out rescue    1.020000   0.010000   1.030000 (  1.105184)
30% With rescue     7.800000   0.300000   8.100000 (  8.827783)
40% W/out rescue    1.030000   0.010000   1.040000 (  1.090960)
40% With rescue    10.020000   0.400000  10.420000 ( 11.028588)
50% W/out rescue    1.020000   0.000000   1.020000 (  1.138765)
50% With rescue    12.210000   0.510000  12.720000 ( 14.080979)
60% W/out rescue    1.020000   0.000000   1.020000 (  1.051054)
60% With rescue    14.260000   0.590000  14.850000 ( 15.838733)
70% W/out rescue    1.020000   0.000000   1.020000 (  1.066648)
70% With rescue    16.510000   0.690000  17.200000 ( 18.229777)
80% W/out rescue    0.990000   0.010000   1.000000 (  1.099977)
80% With rescue    18.830000   0.800000  19.630000 ( 21.634664)
90% W/out rescue    0.980000   0.000000   0.980000 (  1.325569)
90% With rescue    21.150000   0.910000  22.060000 ( 25.112102)
100% W/out rescue   0.950000   0.000000   0.950000 (  0.963324)
100% With rescue   22.830000   0.940000  23.770000 ( 25.327054)

Manual Cucumber Tests?

there was some discussion over on the cucumber list about manual testing.

cucumber is great at BDD, but it doesn’t mean it is the only test technique (preaching to choir) we should use.

i have learned it is critical to understand where automated tests shine, and where human testing is critical — and to not confuse the two.

as far as cuking manual tests, keeping the tests in one place seems like a good advantage (as described in Tim Walker’s cucum-bumbler wiki <g>).

the cucumber “ask” method looks interesting. maybe your testers could use the output to the console as-is, or (re-)write your own method to store the results somewhere else/output them differently.

From the cucumber code (cucumber-1.1.4/lib/cucumber/runtime/user_interface.rb):

# Suspends execution and prompts +question+ to the console (STDOUT).
# An operator (manual tester) can then enter a line of text and hit
# <ENTER>. The entered text is returned, and both +question+ and
# the result is added to the output using #puts.
# ...
def ask(question, timeout_seconds)
...

Sample Feature:

    ...
Scenario: View Users Listing
  Given I login as "Admin"
  When I view the list of users
  Then I should check the aesthetics

Step definition:

Then /^I should check the aesthetics$/ do
  ask("#{7.chr}Does the UI have that awesome look? [Yes/No]", 10).chomp.should =~ /yes/i
end

The output to the console looks like this:

Thanks for the pointer, Matt!

[notice]NOTE: it doesn’t play well with running guard/spork.[/notice]
The question pops up over in the guard terminal 🙁

Of course, if you are running a suite of manual tests, you probably don’t need to worry about the Rails stack being sluggish :-p

    Spork server for RSpec, Cucumber successfully started
    Running tests with args ["features/user.feature", "--tags", "@wip:3", "--wip", "--no-profile"]...
    Does the UI have that awesome look? [Yes/No]
    Yes
    ERROR: Unknown command Yes
    Done.

Supporting SSL in Rails 2.3.4

Somehow, moving a perfectly happy production app to Rackspace and nginx caused URLs to no longer sport the SSL ‘s’ in “https” — bummer.

Link_to’s were fine… But a custom “tab_to” (responsible for highlighting the current tab) was not so happy (even though it used an eventual link_to).

Turns out, that it is the url_for method as I learned from here.

I also blended it with some ideas I found here.

# config/environment/production.rb
# Override the default http protocol for URLs
ROUTES_PROTOCOL = "https"
...
# config/environment.rb
# Specifies gem version of Rails to use when vendor/rails is not present
RAILS_GEM_VERSION = '2.3.5' unless defined? RAILS_GEM_VERSION
# Use git tags for app version
APP_VERSION = `git describe --always --abbrev=0`.chomp! unless defined? APP_VERSION
# The default http protocol for URLs
ROUTES_PROTOCOL = 'http'
...
# application_controller.rb
  # http://lucastej.blogspot.com/2008/01/ruby-on-rails-how-to-set-urlfor.html
  def default_url_options(options = nil)
     if ROUTES_PROTOCOL == 'https'
       { :only_path => false, :protocol => 'https' }
     else
       { :only_path => false, :protocol => 'http' }
     end
  end
  helper_method :url_for

Now the real kicker… Since I do not have SSL set up locally, I had to do some dev on our staging server to tweak the code and test that “https” showed up. so I turned off class caching: config.cache_classes = false.

However, when I cap deployed with it set back to “true” https did not show up. @#$$##@!!!!%%$% AARGH.

I suspect it might have something to do with not being able to open up a cached class and redefine it? I don’t know… I am going to have to go explore this oddity next…

Anatomy of a MongoDB Profiling Session

This particular application has been collecting data for months now, but hasn’t really had any users by design. At 33GB of data, pulling up a list of messages received was taking f-o-r-e-v-e-r!

So I decided to document how to go about and fix a running production system… Hope it helps.

Log into mongo console and turn on profiling (the ‘1’) to monitor slow queries. I entered >10 seconds, which really stinks (!). You should adjust it to suit your app’s definition of “slow”—maybe 500ms:

> db.setProfilingLevel(1,10000)
{ "was" : 0, "slowms" : 100, "ok" : 1 }

Next I went back to the webapp and executed the page request that exhibits the slow response …

Once the page returns, go in and look for any slow responses that the profiler logged:

> db.system.profile.find()
{
  "ts" : ISODate("2012-02-18T15:34:02.967Z"), 
  "op" : "command", 
  "command" : { "count" : "messages", 
    "query" : { "_type" : "HL7Message", "recv_app" : "CAREGIVER" }, 
      "fields" : null }, 
  "ntoreturn" : 1, 
  "responseLength" : 48, 
  "millis" : 119051, 
  "client" : "192.168.100.67", 
  "user" : "" 
}
{
  "ts" : ISODate("2012-02-18T15:35:51.704Z"),
  "op" : "query", 
  "query" : { "_type" : "HL7Message", "recv_app" : "CAREGIVER" },
  "ntoreturn" : 25,
  "ntoskip" : 791025,
  "nscanned" : 791051,
  "nreturned" : 25,
  "responseLength" : 49956,
  "millis" : 108720,
  "client" : "192.168.100.67",
  "user" : "" 
}

You can see there was a count query and a query for the data itself (we are using pagination). Sure enough, look here:

  • “ntoreturn” : 25,
  • “nscanned” : 791051,

 

Wow, that’s nasty… to return 25 records, we scanned 791,051! Gulp. Looks like a full table scan. Never a good thing (unless you have very small amounts of data).

Let’s see what sorts of indexes exist for the messages collection:

db.system.indexes.find( { ns: "production-alerts.messages" } );
{ "name" : "_id_", "key" : { "_id" : 1 }, "v" : 0 }
{ "v" : 1, "key" : { "created_at" : -1 }, "name" : "created_at_-1" }
{ "v" : 1, "key" : { "_type" : 1 }, "name" : "_type_1" }
{ "v" : 1, "key" : { "recv_app" : 1 }, "name" : "recv_app_1" }
{ "v" : 1, "key" : { "created_at" : -1, "recv_app" : 1 }, "name" : "created_at_-1_recv_app_1" }
{ "v" : 1, "key" : { "message_type" : 1 }, "name" : "message_type_1" }
{ "v" : 1, "key" : { "trigger_event" : 1 }, "name" : "trigger_event_1" }

Well, as expected, there is no index covering the multiple keys that we are searching on. So let’s add a multi-key index to match the query used by the controller!

db.messages.ensureIndex({_type:1, recv_app:1});

Now the app FLIES!! We dropped from 100+ seconds to 1.5 seconds (look at the “millis”) w00t!

db.messages.find({ _type : "HL7Message", recv_app : "CAREGIVER"}).explain();
> db.messages.find({ _type : "HL7Message", recv_app : "CAREGIVER"}).explain();
{
	"cursor" : "BtreeCursor _type_1_recv_app_1",
	"nscanned" : 791153,
	"nscannedObjects" : 791153,
	"n" : 791153,
	"millis" : 1546,
	"nYields" : 0,
	"nChunkSkips" : 0,
	"isMultiKey" : false,
	"indexOnly" : false,
	"indexBounds" : {
		"_type" : [
			[
				"HL7Message",
				"HL7Message"
			]
		],
		"recv_app" : [
			[
				"CAREGIVER",
				"CAREGIVER"
			]
		]
	}
}

To prevent this sort of thing, you can consider adding indexes when you create new queries. But the best way to do this is to be empirical and know whether you should add the index through some testing. I’ll leave that for another day!

The Bizarro Manifesto

Let’s try a little Bizarro1 test (if you agree to these, I’ll poke you with a hot krypton stick):

We are uncovering better ways to provide the illusion of developing software by listening to others talk about watching people try. Through this (dare I call it?) work, we have come to value:

  • Dogmatic process and CASE-tool-like automation over inspiring quality individuals to interact with the team and the clients
  • Sufficient up-front comprehensive design specifications over seeing frequent, tangible, working results.
  • Writing detailed Statements of Work and negotiating changes over collaborating to do our collective best with the time and money at hand
  • Driving toward the original project plan over accommodating the client changing their mind, or a path turning into a dead end

To elaborate:

  • We prefer to focus on building software with lock-step process and tools — and reduce our need to worry about quality individuals and having conversations amongst developers or with the client.
    • that way we don’t need to worry about people issues and effective communication.
    • That way we can hire any individual regardless of skill, and forgo all verbal/personal interactions in favor of solely written words. Even better if those written words are automatically transformed into code. Maybe we can get non-coder tools! After all, people are merely fungible assets/resources, and software is factory-work — with processes and tools, and a horde of low-paid serfs, we can crank it out!
  • We prefer to spend a lot of time up-front ensuring we have the requirements and design specs fully determined —  rather than have tangible, working results early on.
    • We start with complete requirements specifications (often 400 pages), that follow our company standard template.
    • Even our Use Cases follow a mandatory 3-level deep path, with proper exception and alternate paths worked out.
    • We link the requirements items into detailed design documents — which include system design diagrams, interface specifications, and detailed object models.
    • If we don’t write it all down now, we’re likely to forget what we wanted. And if we don’t do it to the n-th degree, the developers might screw it up.
    • Writing it all down up front allows us to go on vacation while the process and tools “write” the code from the detailed specs/diagrams. Sweet.
    • In addition, we love to be rewarded by reaching meaningless intermediate deadlines that we place on our 1500-node Gantt chart.
    • When we combine all of the upfront work with important deadlines, many of the senior managers can get promoted due to their great commitment to generating reams of cool-looking documents. By the time the sh!t hits the fan when folks realize the “ship it” deadline is missed, the senior managers are no longer involved.
    • Besides, if we actually built software instead of writing all sorts of documents thinking about building software, our little ruse would be exposed!
  • We prefer to work under rigid Statements of Work — rather than actually work towards a “best-fit” solution given changing conditions of understanding and needs.
    • The agreement is based on the 400-page, fully-specified requirements document, and we pad the cost estimate with a 400% profit margin.
    • We then hire dozens of people to argue during the Change Control Review Board monthly meetings about re-writing code to deliver what you wanted versus what you asked for when you thought you knew what you wanted (and wrote it down in that 400-page doc that was signed off by 6 execs).
    • Contract negotiation pissing matches are such great uses of our collective resources and always result in perfect software! We love our fine print 🙂
    • With a 400% padding, the projects are too big to fail.
    • Once we are in it for 1 or 2 million and 50% done and 2x schedule overrun, who would ever say “No” to a contract extension? Who better to get you to the goal line than the same folks who squandered away your treasure, pissed away the calendar, and delivered no working software yet?
    • We like to appear like we’re just about done… Asymptote? Never heard of one.
  • We prefer to be driven by our initial plan — rather than dealing with change and having to re-print the Gantt.
    • Especially a Gantt chart that has been built with tender loving care to include resource allocations, inter-project dependencies, and partial resource allocation assignments for matrix-style organizations.
    • We love hiring a small army to ensure that we drive the entire team to meet every micro-task deadline even when they no longer make any sense.
    • The real fun is collecting the “actuals” data from the developers assigned to each task so we can compare it to their estimated hours.
    • And nothing sweeter than seeing 90% of our tasks being started, and 75% of those being 67% resolved;  and 25% of the resolved actually being complete — the roll-up summary to management is such a compelling story of success.
    • Changing such a beautiful plan that took 4 man-years to develop, that incorporates all of the comprehensive non-code documents, and is an appendix in the contract, is no small feat!
    • Better to produce the software according to plan even if nobody wants it that way. That’s our motto, and we’re not going to change!
    • We love the illusion of activity over the truth of delivered features.

Feel free to sign the manifesto below. It’s free to be certified.


1

    Credit goes to Superman and Bizarro World.