Category Archives: mongomapper

MongoMapper Query Overview

There was a question on the MongoMapper Google Group from a Mongoid user about how MongoMapper handles associations. Brandon was surprised that this query returned an Array:

Product.first.releases.where(something)

Let’s break it down, one bit at a time and clear things up:

# This would be an instance of Product
Product.first # Class.

This simply gets the first element in the Array that is returned by the default “All” query on Product. Of course, without sorting, you probably would not want to do this.

# This would be a return value of an array, assuming Product <>----> * Release
Product.first.releases # Array.

In Brandon’s example, I assume “releases” is a many association. That means, an Array. Unless the association has been tweaked to have default sorting via an Association Extension, getting the “first” one might be adventurous.

# This doesn't change the above... merely adds a restrictive query clause
Product.first.releases.where(something) # Array.

Here we simply get the first element of the releases array, narrowed down by the “something” query.

Capisce?

I am not sure why, but for me it seems more logical to start my clauses with the where, and narrow them down further, or modify them… In MongoMapper, I find querying rigor is much more “loose” than say a SQL SELECT query that requires things in proper order… I would tend to write my queries in more or less this fashion:

ModelClass.where(some criteria).[sort | order | another where clause | fields | limit].[all | first | paginate]

In addition, it is important to note that MongoMapper returns a query and does not actually perform the query until you add something that needs the results. For example: all, first, paginate, sort, etc.

I can picture one of those “man page” or SQL style of fancy ways to show you how you can construct a mongomapper query given all the combinations of options for each “position” in the query…

My (unsolicited) advice is to make the query look as “natural” as possible in terms of how you might read it aloud.

Product.releases.where(:major.gt => 1).sort(:minor.desc).first # Get the latest 1.x release

(And, if the releases where clause query is common, you can create an Association Extension)

Use the Console

You can always just output the queries to the console:

>> Patient.where(:last_name=>/john/i).class
=> Plucky::Query
>> Patient.where(:last_name=>/john/i).all.class
=> Array
>> Patient.where(:last_name=>/john/i).all.count
=> 1
>> Patient.where(:last_name=>/john/i).first.class
=> Patient
>> Patient.sort(:created_at.desc).first.class
=> Patient

Association Extension

And to show an example of an extension (when you use it frequently, for example):

class Encounter
  include MongoMapper::Document
  ...
  # Associations :::::::::::::::::::::::::::::::::::::::::::::::::::::
  many :events, :limit => 30, :order => 'msg_timestamp desc' do
    ...
    def images
      where(:type => [EventConstants::EventType.to_text(EventConstants::EventType::IMAGE)]).order(:created_at.desc).all
    end

    def charts
      where(:type => [EventConstants::EventType.to_text(EventConstants::EventType::ED_SUMMARY)],
            :file_version.in => ["P", "F"]).order(:created_at.desc).all
    end

    def admits
      all(:type => [EventConstants::EventType.to_text(EventConstants::EventType::ADMIT)])
    end
  end
  ...
end

# For a given encounter
enc=Encounter.find('4dadad188951a20727000160')
>> enc.events.images.count
=> 7
>> enc.events.images.class
=> Array
>> enc.events.images.first
=> #

Named Scope

If you will need dynamic querying, you could use a Named Scope as follows:

scope :by_days_old,  lambda { |age| where(:msg_timestamp.gt => age.days.ago) }

This can be used as follows:

Encounter.by_days_old(10)
=> #Fri Apr 15 03:35:53 UTC 2011}>

Factory Girl and MongoMapper

You were probably hoping for some Rosey the Riveter poster…

Factory Folder

Factory Folder

Instead, I am going to extend my small MongoMapper example to include Factory Girl. The steps are pretty simple:

  1. Go here to install…
  2. Create your factories
  3. Use the factories in Cucumber/RSpec

Factory Construction

I created a new “factories” folder under the spec folder:

The factories for User and Event are quite simple:

Factory.define :user do |u|
  u.name ('a'..'z').to_a.shuffle[0..7].join.capitalize
end

and

require 'factory_girl'
def dummy_word(len=6)
  ('a'..'z').to_a.shuffle[0..len].join.capitalize
end

def dummy_date
  secs_in_day = 24*60*60
  Time.now + (rand(60)*secs_in_day - 30)
end

Factory.define :event do |e|
  e.title "#{dummy_word} #{dummy_word 3} #{dummy_word 10}"
  e.date  dummy_date
end

Refactor Original Setup

Instead of using this style of test data creation:

@event = Event.create(:title => "Code Retreat Timbuktoo", :user => @fred)

We will use the new factory as follows:

@event = Factory(:event, :title => "Code Retreat Timbuktoo", :user => @fred)

Refactor Cucumber

The given went from this:

Given /^A set of events$/ do
  fred = User.find_or_create_by_name("fred")
  (1..10).each do
    Event.create(:title=>"#{dummy_word} #{dummy_word 3} #{dummy_word 10}",
                 :date => dummy_date,
                 :user => fred)
  end
  harry = User.find_or_create_by_name("harry")
  (1..10).each do
    Event.create(:title=>"#{dummy_word} #{dummy_word 3} #{dummy_word 10}",
                 :date => dummy_date,
                 :user => harry)
  end
  Event.count.should == 20
end

to this – including refactoring out dummy_title, and reducing it to one loop:

Given /^A set of events$/ do
  fred = User.find_or_create_by_name("fred")
  harry = User.find_or_create_by_name("harry")
  (1..10).each do
    evt = Factory(:event, :title => dummy_title,
                          :date  => dummy_date,
                          :user  => fred)
    evt = Factory(:event, :title => dummy_title,
                          :date  => dummy_date,
                          :user  => harry)
  end
  Event.count.should == 20
end

Subtle Details

The beauty of having tests is that I could easily mess around with getting some of the Factory Girl configuration stuff in the right place. Try something, run the test, adjust as needed until all are back to green.

The file features/support/env.rb got some additions so that Cucumber could find the factories:

$LOAD_PATH << File.expand_path('../../../app/model' , __FILE__)
require 'user'
require 'event'
require 'spec/factories/events.rb'
require 'spec/factories/users.rb'
load 'config/mongo_db.rb'

All the tests still pass!

More Complicated Example

For a project I work on, my factories look like this, with auto-creation of random IDs:

def random_months(months)
  day_in_secs = (24*60*60)
  (1+rand(months))*30*day_in_secs
end

# ----------- GROUP -----------
Factory.sequence :group_num do |n|
  "99#{n}#{rand(n)}"
end

Factory.define :group do |g|
    g.group_num {Factory.next(:group_num)}
    g.name "Greatest Group"
end
# ----------- ACCOUNT -----------
Factory.sequence :doctor_num do |n|
  "999992#{n}#{rand(200+n)}"
end

Factory.sequence :login do |n|
  "AB#{rand(n*68)}bx#{rand(200+n)}"
end

Factory.sequence :msid do |n|
  "CQ987Z12#{n}#{rand(n)}"
end

Factory.define :account do |a|
  pw = 'password'
  a.msid { Factory.next(:msid) }
  a.doctor_num { Factory.next(:doctor_num) }
  a.first_name "James"
  a.last_name "Jones"
  a.role 'user'
  a.password pw
  a.password_confirmation pw
  a.email Setting.get("AutoEmail")
  a.login { Factory.next(:login) }
end

# ----------- PATIENT -----------
Factory.sequence :patient_num do |n|
  "#{n}#{rand(300+n)}"
end

Factory.define :patient do |pt|
#  pt.patient_num "10000009"
  pt.patient_num {Factory.next(:patient_num)}
  pt.emr_num "1853286"
  pt.first_name "John"
  pt.last_name "Johnson"
  pt.dob {(Time.now - random_months(36))}
  pt.count_public_encounters 1
  pt.count_public_events 2
end

 

Cucumber, RSpec, MongoMapper, Git, Oh My!

Nick left a question on my many-to-many associations post. He wanted to know more about sorting by date and querying…

So I decided to (over achieve and) show how I would approach that as if I were adding a new feature:

  • Adding a “date” key field and an index
  • Using Cucumber to drive the new feature from the desired behavior à la BDD.
  • Querying with a date sort tacked on…

You can follow my progress from the earlier version to this one by examining the commit history:

Commit History

Commit History for Adding Dates to the Event Class

Looking at the “added initial event date functionality” commit, you can see how I added some new files to allow for Cucumber and MongoMapper:

Commits for Event Dates

Commits for Event Dates

For Cucumber, I added the “/features” stuff.

And since I wanted to start testing the code, I had to make the database functional, so I added “mongo_db.rb” – and I assume you have MongoDB installed and running locally.

##### MONGODB SETTINGS #####
MongoMapper.connection = Mongo::Connection.new('localhost', 27017, :pool_size => 5)
MongoMapper.database = "event-development"

I am never quite sure if there is a “perfect” way to wire up small, non-Rails apps like this one to use MongoDB. But what I have done works good enough to allow for a simple example to run.

BDD Cycle

So I began the BDD cycle by creating a “feature branch” and switching to a new branch from the current master branch:

git checkout -b event_dates master

Next I wrote the feature for the new behavior:

Scenario: Sort Events by Date
  Given A set of events
  When I display the events
  Then I should see them sorted by latest date first

When you run Cucumber, you will get the default code for steps – all pending of course.  Naturally, I did one step at a time, to take each one from “pending” to green. Working on the Given, then the When, and finally the Then, I came up with these steps:

Given

Here I wanted to generate a set of data so that we could see if the list was sorted properly. You can check out the code on github for the randomness baked into the “dummy_*” helper methods. And I wanted to create the events for two users.

Given /^A set of events$/ do
  fred = User.find_or_create_by_name("fred")
  (1..10).each do
    Event.create(:title=>"#{dummy_word} #{dummy_word 3} #{dummy_word 10}",
                 :date => dummy_date,
                 :user => fred)
  end
  harry = User.find_or_create_by_name("harry")
  (1..10).each do
    Event.create(:title=>"#{dummy_word} #{dummy_word 3} #{dummy_word 10}",
                 :date => dummy_date,
                 :user => harry)
  end
  Event.count.should == 20
end

When

Sort of playing along as if this is a web request, I coded this step to return a “response” that is generated by the “list all events” class method. In true BDD fashion, this is the code I wish I had 🙂

When /^I display the events$/ do
  @response = Event.list_all
end

Then

The proof is in scanning the resultant “response” object to ensure date order is correct:

Then /^I should see them sorted by latest date first$/ do
  first = Date.parse(@response.third[0..10])
  last = Date.parse(@response.last[0..10])
  first.should > last
end

Outside – In

As soon as I ran the “When” I got a failure due to Event.list_all not existing. So off to the RSpec-land we go, to write the expectations for list_all. This is known as the “Outside-In” approach. (I learned this term from the excellent RSpec book, and it looks like you can watch a video about it here.)

The behavior expressed above (in Cucumber) can be thought of as more of an “outer,” acceptance/integration style of test. Typically it would be the User Interface (UI) – but I have been known to blur that line, since not all code is about UI and since Cucumber is so darn fun to use. Working at this outer level often leads to expressing what is expected of our actual code; in this case, that Event have a class-level method that returns a list of it’s instances (a.k.a., documents). Since we are talking about the behavior of a class, that is more of the “inside” of the application. Not something that an external user might care so much about directly, but rather something that supports the end behavior in an indirect fashion. For the “inside” we turn to RSpec (basically a better-than-unit-test, unit test tool).

  • Outside ≈ Feature ≈ Cucumber
  • Inside ≈ unit test ≈ RSpec
  describe "#list_all" do
    it "should show each event, ordered by date" do
      response = Event.list_all
      response.should_not be_empty
      response.class.should == Array
      response.size.should == Event.count + 2 #for title and column header
      # Yes, you should not output stuff as part of your tests, but this *is* our UI :-)
      puts response
    end
  end

Many times, my initial pass at a new method is to simply return what is expected. Then write another test to make that fail. Sort of “sneak up on the answer.” But here it was easy enough to simply output some real text from the get-go:

  def self.list_all
    response = []
    response << "%s %s %s" % ["*"*10, "LIST OF EVENTS", "*"*10]
    response << "%6s %15s               %s" % ["Date", "TITLE", "Attendees/Interested/Likes"]
    events = Event.all(:order => 'date desc')
    events.each {|e| response << e.to_summary}
    response
  end

Oops. LOL (:-D) While writing this post, I found a mistake when testing my code a bit further than my initial commit.

I decided to remove the order part of the query, which revealed that the Cucumber feature still passed. Crap! So, it wasn’t so easy after all! Dope.

    events = Event.all

Second Attempt

My initial way of generating records resulted in the documents magically being in the right date order by default. Tests passed, but the test was wrong – not vigorous enough testing!

Note to self:
no matter how trivial things seem, write failing tests
that contradict each other – so to speak.

So, I tweaked the document generator to better randomize the list of events such that we won’t accidentally have them all in proper order by default:

  ...
  fred = User.find_or_create_by_name("fred")
  (1..10).each do
    Event.create(:title=>"#{dummy_word} #{dummy_word 3} #{dummy_word 10}",
                 :date => Time.now + (rand(60)*secs_in_day - 30),
                 :user => fred)
  end
  ...

And, instead of just spot-checking the order, here is a new RSpec test to ensure each event is in proper order, date-wise:

    it "should show each event, ordered by date" do
      response = Event.list_all
      response.should_not be_empty
      response.class.should == Array
      response.size.should == Event.count + 2 #for title and column header
      r_prior_date = Date.parse(response[2][0..10])
      response[3..response.size].each do |r|
        date = Date.parse(r[0..10])
        date.should < r_prior_date
        r_prior_date = date
      end
      # Yes, you should not output stuff as part of your tests, but this *is* our UI :-)
      puts response
    end

Now we’re talking! A failed test:

'Event#list_all should show each event, ordered by date' FAILED
expected: < Wed, 20 Apr 2011,
     got:   Wed, 20 Apr 2011

And similarly, I re-wrote the Cucumber test. Funny thing, further testing revealed that the error above was not actually a legitimate fail as it turns out! I discovered I needed “<=” instead of just “<“– sometimes the simplest things aren’t so simple after all. Especially when it comes to setting up sample data.

Then /^I should see them sorted by latest date first$/ do
  last_date = Date.parse(@response.third[0..10])
  @response[3..@response.size].each do |r|
    date = Date.parse(r[0..10])
    date.should <= last_date
    last_date = date
  end
end

And I got the above test to fail by “stepping back” and removing the “order by” clause to get me back to an original, non-sorted listing. Good! Now we can step forward again and try to get the functionality that we are looking for to work.

Cucumber Failing Tests

Cucumber Failing Tests

I re-enabled the order clause to see if the tests would now pass:

events = Event.all(:order => 'date desc')

And, fortunately, the tests are indeed passing:

Cucumber Passing Tests

Cucumber Passing Tests

Commit on Green

Once you get the bits of functionality working, commit (even if you still have pendings). Committing locally has no downside 🙂 Here I will commit and push to the repo (the “$” is my prompt (well, not really), and the #comments are not part of the command line!):

$git status #You can see your changes
$git commit -a -m "added initial event date functionality"  #commit your changes
$git checkout master #switch to the master branch
$git merge --no-ff event_dates #merge all of your local feature branch commits, preserving each
$git push origin master #Pump it up to the repo
$git branch -d event_dates #Get rid of the feature branch

This rhythm gets to be very familiar.

More on Querying

You may have noticed some of the queries above, and this was one of Nick’s questions…

With MongoMapper, you can chain Plucky queries as follows:

  def self.list_all(a_user=nil)
    response = []
    response << "%s %s %s" % ["*"*10, "LIST OF EVENTS", "*"*10]
    response << "%6s %15s               %s" % ["Date", "TITLE", "Attendees/Interested/Likes"]      events = nil     if a_user.nil?       events = Event.all(:order => 'date desc')
    else
      events = Event.where(:user => a_user).all.sort(:date.desc)
    end
    events.each {|e| response << e.to_summary}
    response
  end

I added a new feature that shows off the above query, quickly ran through the entire process again, from git checkout to git push, with Cucumber and RSpec and code in between. You can find it all in the source code.

Cucumber Show Events For User

Cucumber Show Events For User

Git is really an amazing revision control tool… I can’t imagine using anything else now. Here is an example of looking at the “Network Graph” of my little project:

github network graph

Github Network Graph

Multiple Many-to-Many Associations in MongoMapper

There was a question in the Google Group for MongoMapper, so I decided to post an answer in the form of a simple demo. You can find the source code on Github.com here.

The basic shape of the problem was this:

Users Sponsor and Attend Events

Users Sponsor and Attend Events

And the solution issues were around the multiple many-to-many associations, more or less.

For a simple one-to-many, MongoMapper has the normal:

  • User has many :events
  • Event belongs_to :user (its Owner)

But how to do the other associations? A given User can be involved with many events in different capacities:

  • Attending
  • Interested in attending
  • Likes

There are different ways to tackle these many-to-many associations.

  1. You can use a Set (to obtain the uniqueness factor) of Users that are attending or are interested.
  2. You can use an Array of instance IDs (I think this is probably the more standard technique)
class Event
  include MongoMapper::Document

  key :title, :required => true

  key :user_id
  belongs_to :user

  # One way to do it...
  key :attendees, Set
  key :interested, Set

  # Another way to do it...
  key :like_ids, Array
  many :likes, :class_name => 'User', :in => :like_ids

  def attending(a_user)
    # self.push_uniq(:attendees => a_user.id)
    attendees << a_user.id
    save
  end

  def interested_in(a_user)
    interested << a_user.id
    save
  end
...

In which direction you allow making the association, that is up to your application’s needs. For example, above you can see that an Event instance could be messaged with the user to indicate attending or interested_in. The “likes” is immediately accessible from an Event, or I could have added a wrapper method (def likes(a_user)).

And the User class has some simple retrieval methods to see what a User likes, what they are attending, and what they are interested_in:

class User
  include MongoMapper::Document

  key :name, :required => true

  many :events

  def likes
    Event.where(:like_ids => id).all
  end

  def attending
    Event.where(:attendees => id).all
  end

  def interested_in
    Event.where(:interested => id).all
  end

  def likes_event(event)
    event.likes << self
    event.save!
  end

To see how the different styles are used, you can check out the specs. For example:

Adding users who like an Event:

  it "should track interested" do
    expect {
      @event.interested_in(@jared)
      @event.interested_in(@sally)
    }.to change {@event.interested.size}.by(2)
  end

it "should allow 'likes'" do
    expect {
      @event_2.likes << @martha
    }.to change {@event_2.likes.size}.by(1)
  end

Or from the User perspective:

  it "should allow me to add an event I like" do
    @fred.likes_event(@event_2)
    @event_2.likes.size.should > 0
    @fred.likes.count.should > 0
  end

Or attendees, from the Event:

  it "should list the events I am attending" do
    [@fred, @harry].each {|u| @event.attending(u)}
    @fred.attending.count.should > 0
  end

Ruby Metaprogramming in the Small

I have been thoroughly enjoying working with Ruby this past year (thanks Lee!). However, only recently have I been getting brave/comfortable/wise enough to try out some metaprogramming. Okay, so maybe I am a little slow… The scourge of making deadlines and releasing software meant I sometimes just had to give up trying to get to an elegant solution that I thought was possible, but was unable to make work in the time allotted. I try to be pragmatic, if nothing else.

But here is a little example of how easy it is to exploit Ruby’s omniscient metaprogramming system.

Background

The project is a Rails app using MongoMapper. I needed to enhance the way we create User Accounts to accommodate importing a user from a CSV file (dumped from the hospital’s account management system).

First up: “Insert New Record” — which went pretty smoothly. Next, I wanted to permit merging import data with an existing account. (For example, changing the last name when married status changes.)

Round 1: Get it to work

Using a TDD approach and RSpec to flesh out the low-level class behavior, I “snuck up on the answer,” one small test at a time. I tend to use a “get it to work with brute force” approach at the outset. Leading to sometimes bulky code as can be seen below. So, one-by-one, I kept adding to the merge code for each new attribute that would allow updating.

NOTE: I added some (all?) of the code comments below for this blog post. Plus I took liberties to not show everything…

  class Account
    include MongoMapper::Document
    # Rest of class omitted
    def self.create_or_merge(fields)
      raise ArgumentError if fields.nil?

      # Some code omitted

      # There are 2-3 legal ways to identify a unique account :-(
      account = Account.find_by_identifiers(login, msid, doctor_num)

      if account.nil?
        # Create
        if login.blank?
          login = generate_login(fields[:first_name], fields[:last_name], msid, doctor_num)
          fields[:login] = login
        end
        account = Account.create(fields)
        account.save!
      else
        # Merge
        first_name = fields[:first_name]
        last_name  = fields[:last_name]
        phone      = fields[:phone]
        email      = fields[:email]
        doctor_num = fields[:doctor_num]

        account.last_name = last_name unless last_name.blank?
        account.first_name = first_name unless first_name.blank?
        account.phone = phone unless phone.blank?
        account.email = email unless email.blank?
        account.save!
      end
      account
    end

  end

Round 2: Extract Method

The first step to making “create_or_merge” simpler was to yank out the blob of merge code into it’s own method. So the logic in create_or_merge looks a bit cleaner:

  • If can’t find account,
    • create new one;
  • else
    • merge this data into existing account.
  class Account
    include MongoMapper::Document
    # Rest of class omitted
    def self.create_or_merge(fields)
      raise ArgumentError if fields.nil?
      # Some code omitted

      # There are 2-3 legal ways to identify a unique account :-(
      account = Account.find_by_identifiers(login, msid, doctor_num)

      if account.nil?
        #Create
        if login.blank?
          login = generate_login(fields[:first_name], fields[:last_name], msid, doctor_num)
          fields[:login] = login
        end
        account = Account.create(fields)
        account.save!
      else
        #merge
        account.merge(fields)
      end
      account
    end
    def merge(fields)
      puts "Merging #{fields.inspect}"
      first_name = fields[:first_name]
      last_name  = fields[:last_name]
      phone      = fields[:phone]
      email      = fields[:email]
      doctor_num = fields[:doctor_num]

      self.last_name = last_name unless last_name.blank?
      self.first_name = first_name unless first_name.blank?
      self.phone = phone unless phone.blank?
      self.email = email unless email.blank?
      self.doctor_num = doctor_num unless doctor_num.blank?
      save!
    end
  end

Round 3: Introduce dynamic method calls

It is plain to see the repeating nature…based on the fields being passed in:

self.KEY = VALUE unless VALUE.blank?

If only there were a way to not have to write out repeating lines of code for each attribute we need to merge. Well, enter the ability to invoke instance methods by name, and passing in parameters:

Normal:

self.last_name = last_name unless last_name.blank?

Metaprogramming:

self.send("last_name", "Franklin")

And getting the name from the fields hash

self.send("#{k.to_s}=", v)  unless v.blank?

Also, there is a need to tailor which fields are allowed to be merged, or overwritten; hence, the introduction of this bit of “allow_overwrite” complexity.

  def merge(merge_fields)
    allow_overwrite = [:first_name, :last_name, :doctor_num, :phone, :email]
    # Only merge fields that are permitted... tossing out any illegal fields
    fields = merge_fields.each {|k,v| merge_fields.delete(k) unless allow_overwrite.include?(k.to_sym)}
    fields.each_pair do |k,v|
      # Update the field with new data, if available
      self.send("#{k.to_s}=", v)  unless v.blank?
    end
    save!
  end

Round 4: Compress slightly, removing one iteration loop

Instead of looping twice through the fields, I reduced it to a single pass.

  def merge(merge_fields)
    allow_overwrite = [:first_name, :last_name, :doctor_num, :phone, :email]
    merge_fields.each do |k,v|
      next unless allow_overwrite.include?(k.to_sym)
      self.send("#{k.to_s}=", v)  unless v.blank?
    end
    save!
  end

The Tests

Here is a snippet from my RSpec tests:

  # Uses metaprogramming too...
  def check_merging(field, new_value)
    @fields[field.to_sym] = new_value
    expect {
      account = Account.create_or_merge(@fields)
    }.to_not change { Account.count }.by(1)
    act = Account.find(@account.id)
    act.instance_eval(field).should == new_value
  end

  describe "being merged" do
    before do
      @group_num = "009015"
      group_name = "Country Doctor Pediatrics"
      grp = Group.find_by_group_num_or_create(@group_num, group_name)
      @doctor_num = "6709#{rand(20)}"
      @fields = {:login      => "jmadison",
                 :email      => "johns@CountryPedDocs.com",
                 :doctor_num => @doctor_num,
                 :name       => 'Dr. John Madison',
                 :first_name => "John",
                 :last_name  => "Madison",
                 :group_name => group_name,
                 :group_num  => @group_num
      }
      @account = nil
      expect {
        @account = Account.create_or_merge(@fields)
      }.to change{ Account.count }.by(1)
    end

    it "should merge new last name" do
      new_value = "Mattson"
      field = "last_name"
      check_merging(field, new_value)
    end

    it "should merge new first name" do
      new_value = "Mary Lou"
      field = "first_name"
      check_merging(field, new_value)
    end

    it "should merge new phone" do
      new_value = "123-321-1234"
      field = "phone"
      check_merging(field, new_value)
    end

    it "should merge new email" do
      new_value = "some_good_email@humptyfratz.biz"
      field = "email"
      check_merging(field, new_value)
    end

    it "should merge new doctor_num" do
      new_value = @doctor_num.reverse
      field = "doctor_num"
      check_merging(field, new_value)
    end

    after do
      if @account
        @account.destroy
        @account.save
      end
      Account.hard_delete
    end
  end

MongoMapper vs MongoDB Cursor Stats

I just love developing with MongoMapper and MongoDB… This weekend I had an easy opportunity to test out the performance between iterating through a collection via MongoMapper or MongoDB cursor. (I had to fix up a field that I munged by screwing up some production code — oops)

My findings showed that the cursor approach was ~1.8x faster.

There’s probably some underlying “but of course” comment waiting to come out of John Nunemaker (creator of the amazing MongoMapper) or Kyle Banker (MongoDB expert).

Like, “but of course letting the database server manage the work is always better than returning a big hunk of documents!”

The two flavors of iteration look basically like this:

  • cursor = coll.find({:doctor_num => /^staff_id_numberd{6}/})
  • error_accounts = Account.all(:doctor_num => /^staff_id_numberd{6}/)

The findings were based on “correcting” 5,929 of the 9,002 total accounts.

Time Memory
Cursor 166 sec 104K
MM Array 293 sec 175K


The scientist in me says do a test across a larger number of accounts: 90K, 900K 9M — and see what the trend looks like for the cursor — I would expect pretty flat. The pragmatist says I got more important work to do on our V2 of the production app <g>.

From this little bit of data (see the second figure), it seems that the cursor’s lead in the speed department diminished with increasing record counts. However, the memory consumption stays pretty flat for the cursor approach. I’m sure that the array approach will run out of memory at some point when you try and process a lot of records — never a good thing. (Maybe I should do some research on our message log — ~400k per month.)

Compare Processing Speed and Memory Usage for Cursor and Array Approach

The code for the Cursor way is shown here, with the lines of interest highlighted:

def self.fix_errors_cursor_style
  coll = MongoMapper.database['accounts']
  error_accounts = coll.find({:doctor_num => /^staff_id_numberd{6}/})
  error_accounts.each do |rec|
    new_doctor_num = rec["doctor_num"].match(/(d{6})/).to_s
    accounts = Account.find_dupes(new_doctor_num)
    if accounts.size > 1
      Account.merge_accounts new_doctor_num
    else
      coll.update({"_id" => rec["_id"]}, {"$set" => {"doctor_num" => new_doctor_num}})
    end
  end
end

The code for the MongoMapper way looked like this:

def self.fix_errors_array_style
  error_accounts = Account.all(:doctor_num => /^staff_id_numberd{6}/)
  error_accounts.each do |a|
    new_doctor_num = a.doctor_num.match(/(d{6})/).to_s
    accounts = Account.find_dupes(new_doctor_num)
    if accounts.size > 1
      Account.merge_accounts new_doctor_num
    else
      a.update_attributes( :doctor_num => new_doctor_num )
      result = a.save
      if a.errors
        a.errors.each_pair {|k,e| puts ">>> #{k}: #{e}"}
      end
    end
  end
end

In case this counts for completeness of information presented… The rough numbers of the collection look like this:

  • “count”=>9002,
  • “size”=>5829364,
  • “avgObjSize”=>647.56,
  • “storageSize”=>13880064,
  • “numExtents”=>5,
  • “nindexes”=>4,
  • “lastExtentSize”=>10420224,
  • “paddingFactor”=>1.01,
  • “flags”=>1,
  • “totalIndexSize”=>1679360,
  • “indexSizes”=>{“_id_”=>385024, “login_1″=>352256, “msid_1″=>352256, “doctor_num_1″=>589824}, “ok”=>1.0}

MongoMapper $or and Set keys

Found an interesting issue (?) with $or and mongomapper — and a work-around.

Given (lots omitted):

class Patient
  include MongoMapper::Document
  ...
  key :count_public_encounters, Integer, :default => 0
  # Provide maps to the related "doctor_num(s)" unique IDs.
  key :doctor_num_list, Set, :index => true
  # ...and related "group_nums(s)"
  key :group_num_list, Set, :index => true
  ...
end

In testing, the following basic syntax worked for an AND condition to find public patient count for doctors in a given list:

n = by_docs = Patient.where( :count_public_encounters.gt =>0, :doctor_num_list.in => doc_list).count

Since patients could be associated with a doctor, a group, or both, I wanted to find public patient counts where the group number matched, OR the doctor numbers were in the list.

n = Patient.where(:count_public_encounters.gt => 0,
        :$or =>[{:group_num_list.in => [grp.group_num]},
                {:doctor_num_list.in =>[doc_list]}]).count

After all, this syntax for array query works as a solo part of an AND query. But no luck 🙁

Turns out, the following syntax worked:

n = Patient.where(:count_public_encounters.gt => 0,
        :$or => [{'group_num_list' => { '$in' =>[grp.group_num]} },
                {'doctor_num_list'  => { '$in' =>doc_list}}]).count

Not gonna ask why…

Class- and Field-Level Custom Validations

This is a simple example I put together for someone asking how this validation stuff works…
In this example, you can see field-level validation:

  • team must be chosen
  • name is required

And you can see a class/base-level validation:

  • at least one method of contact, a phone or an email:

Simple class example that employs some conditional validations and some regular validations:

class Registrant
  include MongoMapper::Document

  # Attributes ::::::::::::::::::::::::::::::::::::::::::::::::::::::
  key :name, String, :required => true
  key :email, String
  key :phone, String
  # Parent Info
  key :parent_name, String
  key :parent_email, String
  key :parent_phone, String
  key :street, String
  key :street2, String
  key :city, String
  key :state, String
  key :postal_code, String

  # Associations :::::::::::::::::::::::::::::::::::::::::::::::::::::
  key :team_id, ObjectId
  belongs_to :team
...
  # Validations :::::::::::::::::::::::::::::::::::::::::::::::::::::
  validate :validate_team_selection
  validate :validate_parent_contact_method
  validates_presence_of :parent_name, :street, :city, :state, :postal_code,
                                    :if => :parent_info_required?

...

  private

  def parent_info_required?
    season.parent_info_required?
  end

  def validate_parent_contact_method
    # one or the other must be provided
    if parent_phone.empty? and parent_email.empty?
      errors.add_to_base("At least one form of contact must be entered for the parent: phone or email" )
    end
  end

  def validate_contact_method
    # one or the other must be provided
    if phone.empty? and email.empty?
      errors.add_to_base("At least one form of contact must be entered: phone or email" )
    end
  end

  def validate_team_selection
    if registration_setup.require_team_at_signup
      if team_id.nil?
        errors.add(:team, "must be selected" )
      end
    end
  end
end

The above will result in always checking the name, the contacts, and the team assignment. It is actually pulled from a slightly more complex context in that a few conditionals exist to further complicate the validation logic. If you need checking only at certain points in the object life cycle, you can look to things like validate_on_create.

In the case of “Parent Info” being required, additional validations take place.

The “name” validation is a standard, out of the box field-level validation due to “:required => true” being added to the MongoMapper key.

Enjoy!