MongoMapper vs MongoDB Cursor Stats

I just love developing with MongoMapper and MongoDB… This weekend I had an easy opportunity to test out the performance between iterating through a collection via MongoMapper or MongoDB cursor. (I had to fix up a field that I munged by screwing up some production code — oops)

My findings showed that the cursor approach was ~1.8x faster.

There’s probably some underlying “but of course” comment waiting to come out of John Nunemaker (creator of the amazing MongoMapper) or Kyle Banker (MongoDB expert).

Like, “but of course letting the database server manage the work is always better than returning a big hunk of documents!”

The two flavors of iteration look basically like this:

  • cursor = coll.find({:doctor_num => /^staff_id_numberd{6}/})
  • error_accounts = Account.all(:doctor_num => /^staff_id_numberd{6}/)

The findings were based on “correcting” 5,929 of the 9,002 total accounts.

Time Memory
Cursor 166 sec 104K
MM Array 293 sec 175K


The scientist in me says do a test across a larger number of accounts: 90K, 900K 9M — and see what the trend looks like for the cursor — I would expect pretty flat. The pragmatist says I got more important work to do on our V2 of the production app <g>.

From this little bit of data (see the second figure), it seems that the cursor’s lead in the speed department diminished with increasing record counts. However, the memory consumption stays pretty flat for the cursor approach. I’m sure that the array approach will run out of memory at some point when you try and process a lot of records — never a good thing. (Maybe I should do some research on our message log — ~400k per month.)

Compare Processing Speed and Memory Usage for Cursor and Array Approach

The code for the Cursor way is shown here, with the lines of interest highlighted:

def self.fix_errors_cursor_style
  coll = MongoMapper.database['accounts']
  error_accounts = coll.find({:doctor_num => /^staff_id_numberd{6}/})
  error_accounts.each do |rec|
    new_doctor_num = rec["doctor_num"].match(/(d{6})/).to_s
    accounts = Account.find_dupes(new_doctor_num)
    if accounts.size > 1
      Account.merge_accounts new_doctor_num
    else
      coll.update({"_id" => rec["_id"]}, {"$set" => {"doctor_num" => new_doctor_num}})
    end
  end
end

The code for the MongoMapper way looked like this:

def self.fix_errors_array_style
  error_accounts = Account.all(:doctor_num => /^staff_id_numberd{6}/)
  error_accounts.each do |a|
    new_doctor_num = a.doctor_num.match(/(d{6})/).to_s
    accounts = Account.find_dupes(new_doctor_num)
    if accounts.size > 1
      Account.merge_accounts new_doctor_num
    else
      a.update_attributes( :doctor_num => new_doctor_num )
      result = a.save
      if a.errors
        a.errors.each_pair {|k,e| puts ">>> #{k}: #{e}"}
      end
    end
  end
end

In case this counts for completeness of information presented… The rough numbers of the collection look like this:

  • “count”=>9002,
  • “size”=>5829364,
  • “avgObjSize”=>647.56,
  • “storageSize”=>13880064,
  • “numExtents”=>5,
  • “nindexes”=>4,
  • “lastExtentSize”=>10420224,
  • “paddingFactor”=>1.01,
  • “flags”=>1,
  • “totalIndexSize”=>1679360,
  • “indexSizes”=>{“_id_”=>385024, “login_1″=>352256, “msid_1″=>352256, “doctor_num_1″=>589824}, “ok”=>1.0}

Leave a Reply