MongoMapper vs MongoDB Cursor Stats

I just love developing with MongoMapper and MongoDB… This weekend I had an easy opportunity to test out the performance between iterating through a collection via MongoMapper or MongoDB cursor. (I had to fix up a field that I munged by screwing up some production code — oops)

My findings showed that the cursor approach was ~1.8x faster.

There’s probably some underlying “but of course” comment waiting to come out of John Nunemaker (creator of the amazing MongoMapper) or Kyle Banker (MongoDB expert).

Like, “but of course letting the database server manage the work is always better than returning a big hunk of documents!”

The two flavors of iteration look basically like this:

  • cursor = coll.find({:doctor_num => /^staff_id_numberd{6}/})
  • error_accounts = Account.all(:doctor_num => /^staff_id_numberd{6}/)

The findings were based on “correcting” 5,929 of the 9,002 total accounts.

Time Memory
Cursor 166 sec 104K
MM Array 293 sec 175K

The scientist in me says do a test across a larger number of accounts: 90K, 900K 9M — and see what the trend looks like for the cursor — I would expect pretty flat. The pragmatist says I got more important work to do on our V2 of the production app <g>.

From this little bit of data (see the second figure), it seems that the cursor’s lead in the speed department diminished with increasing record counts. However, the memory consumption stays pretty flat for the cursor approach. I’m sure that the array approach will run out of memory at some point when you try and process a lot of records — never a good thing. (Maybe I should do some research on our message log — ~400k per month.)

Compare Processing Speed and Memory Usage for Cursor and Array Approach

The code for the Cursor way is shown here, with the lines of interest highlighted:

def self.fix_errors_cursor_style
  coll = MongoMapper.database['accounts']
  error_accounts = coll.find({:doctor_num => /^staff_id_numberd{6}/})
  error_accounts.each do |rec|
    new_doctor_num = rec["doctor_num"].match(/(d{6})/).to_s
    accounts = Account.find_dupes(new_doctor_num)
    if accounts.size > 1
      Account.merge_accounts new_doctor_num
      coll.update({"_id" => rec["_id"]}, {"$set" => {"doctor_num" => new_doctor_num}})

The code for the MongoMapper way looked like this:

def self.fix_errors_array_style
  error_accounts = Account.all(:doctor_num => /^staff_id_numberd{6}/)
  error_accounts.each do |a|
    new_doctor_num = a.doctor_num.match(/(d{6})/).to_s
    accounts = Account.find_dupes(new_doctor_num)
    if accounts.size > 1
      Account.merge_accounts new_doctor_num
      a.update_attributes( :doctor_num => new_doctor_num )
      result =
      if a.errors
        a.errors.each_pair {|k,e| puts ">>> #{k}: #{e}"}

In case this counts for completeness of information presented… The rough numbers of the collection look like this:

  • “count”=>9002,
  • “size”=>5829364,
  • “avgObjSize”=>647.56,
  • “storageSize”=>13880064,
  • “numExtents”=>5,
  • “nindexes”=>4,
  • “lastExtentSize”=>10420224,
  • “paddingFactor”=>1.01,
  • “flags”=>1,
  • “totalIndexSize”=>1679360,
  • “indexSizes”=>{“_id_”=>385024, “login_1″=>352256, “msid_1″=>352256, “doctor_num_1″=>589824}, “ok”=>1.0}

MongoMapper $or and Set keys

Found an interesting issue (?) with $or and mongomapper — and a work-around.

Given (lots omitted):

class Patient
  include MongoMapper::Document
  key :count_public_encounters, Integer, :default => 0
  # Provide maps to the related "doctor_num(s)" unique IDs.
  key :doctor_num_list, Set, :index => true
  # ...and related "group_nums(s)"
  key :group_num_list, Set, :index => true

In testing, the following basic syntax worked for an AND condition to find public patient count for doctors in a given list:

n = by_docs = Patient.where( =>0, => doc_list).count

Since patients could be associated with a doctor, a group, or both, I wanted to find public patient counts where the group number matched, OR the doctor numbers were in the list.

n = Patient.where( => 0,
        :$or =>[{ => [grp.group_num]},
                { =>[doc_list]}]).count

After all, this syntax for array query works as a solo part of an AND query. But no luck 🙁

Turns out, the following syntax worked:

n = Patient.where( => 0,
        :$or => [{'group_num_list' => { '$in' =>[grp.group_num]} },
                {'doctor_num_list'  => { '$in' =>doc_list}}]).count

Not gonna ask why…