Monthly Archives: December 2011

Uncle Bob Challenges The Architecture of a Rails App

Uncle Bob has a very interesting keynote at the Ruby Midwest 2011 conference.

I developed apps with the fundamental architecture of the following (with dependencies only crossing one layer):

——
UI Layer
——
Business Object Layer
——
Data Mgt. Layer
——

Born from the one pattern that is king of the hill in my book: “Separation of Concerns.” The above was my architecture for all projects since the early 90s… C++, Java… but so far not so much in Rails.

I have thought about trying it, but not sure if it will pay off or not.

Essentially, it is about cleaving the rails model classes into two parts:

  1. Business Methods, attributes, business rules
  2. Persistence Methods, attributes, all knowledge of the DBMS details

In general, the UI deals with BOs, but sometimes we create dumb “Data Transfer Objects” that are lightweight versions of the business objects to be thrown about the system.

As a side note, in general, moving code to a more “object-oriented” state often ends up with the same lines of code. And often a bit more due to the boiler plate of creating additional classes.

In a current project, we have pulled out the business objects into a separate gem — but mostly because it needs to be used by our web app and by an eventmachine app.

The thing that shocked me the most about Rails, when Corey Haines introduced me to it in 2009, was that it was a lot like “Model Driven Architecture” that I had worked with for a few years. Given an architecture, a vertical slice thru the app, weave the model thru the architecture generator and out comes an application with a consistent architecture for the bulk of the app that is mostly the same (save for model/property names). Commercially, this MDA technology was a failure, last time I checked. Even though I thought it was the smartest way to develop apps, few others did. Except for Rails developers — largely because most rails devs probably have a very different mindset than other devs.

See blast from the past presentation here.

Though Bob pokes fun at Rails high-level directory structure as not revealing the business domain, I am totally fine with that. It’s a good thing. Yea, sure, it is revealing that it is an MVC style app designed to deliver web apps, so what? No matter which architecture is used, I look for the domain classes to tell me what the system is doing…

In my handful of rails apps to date, I have only used MongoDB and MongoMapper — and this is the closest I have gotten to the good old days of when I used the POET Object-oriented Database with C++ back in the late 90s. It is the closest I have been to nirvana. I basically *almost* don’t need to care that there even is a database…

One of these days, I’ll compare and contrast a Rails/MongoMapper app with and without Business Objects separated from Data Management classes.

MongoDB Group Map-Reduce Performance

I wanted to get some aggregated count data by message type for a collection with over 1,000,000 documents.

The model is more or less this (a lot removed for simplicity):

class MessageLog
  include MongoMapper::Document

  # Attributes ::::::::::::::::::::::::::::::::::::::::::::::::::::::
  # Message's internal timestamp / when it was sent
  key :time, Time
  # Message type
  key :event_type, String, :default => "Not Set"
  # The message ID
  key :control_id, String, :index => true

  # Add created_at, updated_at
  timestamps!

  # Indexes :::::::::::::::::::::::::::::::::::::::::::::::::::::::::
  def self.create_indexes
    MessageLog.ensure_index([[:created_at,1]])
    MessageLog.ensure_index([[:event_type,1]])
  end
end

Distinct

Though I originally hard-coded the message types (as they do not change very often, and are meaningless without other code changes anyway), I figured I would test dynamically gathering the distinct types. MongoDb supports the distinct function. From the MongoDB console:

> db.message_logs.distinct("event_type")
[
	"Bed Order",
	"Cactus Update",
	"ED Release",
	"ED Summary",
	"Inpatient Admit",
	"Inpatient Discharge Summary",
	"Not Set",
	"Registration",
	"Registration Update",
	"Unknown Message Type"
]

Though I saw distinct in MongoMapper, I had trouble getting it to work (this is an older app on <v2.0, method missing error).

However, a very powerful technique within MongoMapper worked just perfect! Essentially, every collection in MongoMapper will return itself as a collection that MongoDB understands (in their db.collection.blah format) — helps when you need to execute MongoDB style commands:

class MessageLog
  # @return [Array] a list of unique types (strings)
  def self.event_types
    MessageLog.collection.distinct("event_type")
  end
end

Simple Count

I used a simple technique to iterate over each type and get the associated count:

class MessageLog
  # Perform a group aggregation by event type.
  #
  # @return [Hash] the number of message logs per event type.
  def self.count_by_type
    results = {}
    MessageLog.event_types.each {|type| results[type] = MessageLog.count(:event_type => type)}
    results
  end
end

Map-Reduce Too Slow

In this instance, it turned out that Map-Reduce was significantly slower, and I am not exactly sure why. Other than I suppose that iterating over each document is more expensive than calling count with a filter on the event_type key (which is covered by an index).

class MessageLog
  # Perform a group aggregation by event type.
  # Map-Reduce was slow by comparison (20 seconds vs 2.3 seconds)
  #
  # @return [Hash] the number of message logs per event type.
  def self.count_by_type_mr
    results = {}
    counts = MessageLog.collection.group( {:key => :event_type, :cond => {}, :reduce => 'function(doc,prev) { prev.count += 1; }', :initial => {:count => 0} })
    counts.each {|r| results[r["event_type"]] = r["count"]}
    results
  end
end

Performance Results

As you can see, Map-Reduce took about [notice]10 times longer,[/notice] ~21 seconds versus ~2.3 seconds.

And this is over 1,129,519 documents, so it is a non-trivial test, IMO.

> measure_mr = Benchmark.measure("count") { results = MessageLog.count_by_type_mr}
> measure = Benchmark.measure("count") { results = MessageLog.count_by_type }
ruby-1.8.7-p334 :010 > puts measure_mr
  0.000000   0.000000   0.000000 ( 20.794720)
> puts measure
  0.020000   0.000000   0.020000 (  2.340708)
> results.map {|k,v| puts "#{k} #{v}"}
Not Set                          1
Inpatient Admit              4,493
Unknown Message Type         1,292
Bed Order                    6,948
Registration Update        852,189
Registration               123,064
ED Summary                  94,933
Cactus Update               10,145
Inpatient Discharge Summary 18,150
ED Release                  18,304

 Summary

You may get better performance using simpler techniques for simple aggregate commands. And maybe Map-Reduce shines better on more complex computations/queries.

[important]But your best bet is to test it out with meaningful data samples.[/important]

Hiring a Team Member

I read this very good post “The Number One Trait of a Great Developer” by Tammer Saleh at Engine Yard, and it made me think…

I used to rant about “It’s the Business, Stupid” in conferences, implying that we should be solving problems for our clients, not simply playing with the next shiny toy. Sure, I love shiny toys, and I love to play with them — especially when they make sense within the context of solving a problem. But your scenario is an oft-spotted pattern where folks lack the engineering skills required to build right-sized solutions to meet the here-and-now needs.

And of course, anyone can come up with a complex solution that sprawls across multiple cubicle walls on e-size plotter paper (that’s easy). Only a rare few can come up with the minimalist solution that meets the needs of the business, and can easily grow over time.

As far as hiring, I like to look for the “engineering” mind. After all, engineers put man on the moon, not scientists.

MongoDB Index Performance

As part of this (unintended) mini-series on MongoDB and indexing, I had written a little test to see if I could document performance gains through indexing. I used realworld data, albeit only 50,000 records, to query out a handful or documents (24 being the most).

Related posts:

Here is the code:

require 'test_helper'

class EncounterListingTest < Test::Unit::TestCase

  context "Indexing" do
    ProfileStats2 = Struct.new(:doctor_num, :count, :timing1, :timing2, :timing3)

    should "profile assorted doctor patient retrievals" do
      stats = []
      doctor_nums = ["602490", "603324", "212043", "602938"]
      doctor_nums.each_with_index do |doctor_num, i|
        MongoMapper.database.collection('encounters').drop_indexes
        show_indexes if i == 0
        timing1 = (measure_performance(doctor_num) + measure_performance(doctor_num) + measure_performance(doctor_num))/3

        MongoMapper.database.collection('encounters').drop_indexes
        add_index([[:private_physician, 1]])
        show_indexes if i == 0
        timing2 = (measure_performance(doctor_num) + measure_performance(doctor_num) + measure_performance(doctor_num))/3

        MongoMapper.database.collection('encounters').drop_indexes
        add_index([[:private_physician,1], [:notify_physician,1], [:visible_count,1]])
        show_indexes if i == 0
        timing3 = (measure_performance(doctor_num) + measure_performance(doctor_num) + measure_performance(doctor_num))/3

        n_count = Encounter.count(:private_physician => doctor_num, :notify_physician => 'Y', :visible_count.gt => 0)
        stats << ProfileStats2.new(doctor_num, n_count, timing1, timing2, timing3)

      end

      File.open("test/performance/index_stats_results-#{Time.now.strftime("%d-%m-%Y")}.csv", 'w') do |f|
        puts "%10s  %6s  %5s  %5s  %5s" % ["doctor", "count", "None", "Phys", "Phys/Ntfy/Vis"]
        f.puts "doctor, count, None, Phys, PhysNtfyVis"
        stats.each do |s|
          results = "%10d, %6d, %5.3f, %5.3f, %5.3f" % [s.doctor_num, s.count, s.timing1, s.timing2, s.timing3]
          puts results
          f.puts "%d, %d, %5.3f, %5.3f, %5.3f" % [s.doctor_num, s.count, s.timing1, s.timing2, s.timing3]
        end
      end

    end

  end

  private
  def show_stats(stats)
    stats.each do |s|
      puts "%6d, %5.3f, %s" % [s.count, s.timing, s.index_type]
    end
  end

  def measure_performance(doctor_num = "99602326")
    start = Time.now
    n_public = Encounter.where(:private_physician => doctor_num, :notify_physician => 'Y', :visible_count.gt => 0).all
    delta = Time.now - start
    delta
  end

  def show_indexes
    puts "%s INDEXES %s" % ["*"*12, "*"*12]
    Encounter.collection.index_information.collect { |index| puts "    #{index[0]}" }
  end

  def add_index(new_index)
    coll = MongoMapper.database.collection('encounters')
    coll.drop_index(new_index) if !coll.index_information.detect { |index| index[0] == new_index }.nil?
    Encounter.ensure_index(new_index)
  end

end

Results:

The effect of adding indexes on query performance

The results are shown in the accompanying graph. Except for the query that returned 24 documents, the general trend was that 3 indexes were better than one. And one was w-a-a-a-y better than none (of course, you already knew that). The odd outlier being for count = 6, in that a single index did not perform as well as it did in all the other tests.