I just love developing with MongoMapper and MongoDB… This weekend I had an easy opportunity to test out the performance between iterating through a collection via MongoMapper or MongoDB cursor. (I had to fix up a field that I munged by screwing up some production code — oops)
My findings showed that the cursor approach was ~1.8x faster.
There’s probably some underlying “but of course” comment waiting to come out of John Nunemaker (creator of the amazing MongoMapper) or Kyle Banker (MongoDB expert).
Like, “but of course letting the database server manage the work is always better than returning a big hunk of documents!”
The two flavors of iteration look basically like this:
- cursor = coll.find({:doctor_num => /^staff_id_numberd{6}/})
- error_accounts = Account.all(:doctor_num => /^staff_id_numberd{6}/)
The findings were based on “correcting” 5,929 of the 9,002 total accounts.
Time | Memory | |
---|---|---|
Cursor | 166 sec | 104K |
MM Array | 293 sec | 175K |
The scientist in me says do a test across a larger number of accounts: 90K, 900K 9M — and see what the trend looks like for the cursor — I would expect pretty flat. The pragmatist says I got more important work to do on our V2 of the production app <g>.
From this little bit of data (see the second figure), it seems that the cursor’s lead in the speed department diminished with increasing record counts. However, the memory consumption stays pretty flat for the cursor approach. I’m sure that the array approach will run out of memory at some point when you try and process a lot of records — never a good thing. (Maybe I should do some research on our message log — ~400k per month.)
The code for the Cursor way is shown here, with the lines of interest highlighted:
def self.fix_errors_cursor_style coll = MongoMapper.database['accounts'] error_accounts = coll.find({:doctor_num => /^staff_id_numberd{6}/}) error_accounts.each do |rec| new_doctor_num = rec["doctor_num"].match(/(d{6})/).to_s accounts = Account.find_dupes(new_doctor_num) if accounts.size > 1 Account.merge_accounts new_doctor_num else coll.update({"_id" => rec["_id"]}, {"$set" => {"doctor_num" => new_doctor_num}}) end end end
The code for the MongoMapper way looked like this:
def self.fix_errors_array_style error_accounts = Account.all(:doctor_num => /^staff_id_numberd{6}/) error_accounts.each do |a| new_doctor_num = a.doctor_num.match(/(d{6})/).to_s accounts = Account.find_dupes(new_doctor_num) if accounts.size > 1 Account.merge_accounts new_doctor_num else a.update_attributes( :doctor_num => new_doctor_num ) result = a.save if a.errors a.errors.each_pair {|k,e| puts ">>> #{k}: #{e}"} end end end end
In case this counts for completeness of information presented… The rough numbers of the collection look like this:
- “count”=>9002,
- “size”=>5829364,
- “avgObjSize”=>647.56,
- “storageSize”=>13880064,
- “numExtents”=>5,
- “nindexes”=>4,
- “lastExtentSize”=>10420224,
- “paddingFactor”=>1.01,
- “flags”=>1,
- “totalIndexSize”=>1679360,
- “indexSizes”=>{“_id_”=>385024, “login_1″=>352256, “msid_1″=>352256, “doctor_num_1″=>589824}, “ok”=>1.0}