Anatomy of a MongoDB Profiling Session

This particular application has been collecting data for months now, but hasn’t really had any users by design. At 33GB of data, pulling up a list of messages received was taking f-o-r-e-v-e-r!

So I decided to document how to go about and fix a running production system… Hope it helps.

Log into mongo console and turn on profiling (the ‘1’) to monitor slow queries. I entered >10 seconds, which really stinks (!). You should adjust it to suit your app’s definition of “slow”—maybe 500ms:

> db.setProfilingLevel(1,10000)
{ "was" : 0, "slowms" : 100, "ok" : 1 }

Next I went back to the webapp and executed the page request that exhibits the slow response …

Once the page returns, go in and look for any slow responses that the profiler logged:

> db.system.profile.find()
{
  "ts" : ISODate("2012-02-18T15:34:02.967Z"), 
  "op" : "command", 
  "command" : { "count" : "messages", 
    "query" : { "_type" : "HL7Message", "recv_app" : "CAREGIVER" }, 
      "fields" : null }, 
  "ntoreturn" : 1, 
  "responseLength" : 48, 
  "millis" : 119051, 
  "client" : "192.168.100.67", 
  "user" : "" 
}
{
  "ts" : ISODate("2012-02-18T15:35:51.704Z"),
  "op" : "query", 
  "query" : { "_type" : "HL7Message", "recv_app" : "CAREGIVER" },
  "ntoreturn" : 25,
  "ntoskip" : 791025,
  "nscanned" : 791051,
  "nreturned" : 25,
  "responseLength" : 49956,
  "millis" : 108720,
  "client" : "192.168.100.67",
  "user" : "" 
}

You can see there was a count query and a query for the data itself (we are using pagination). Sure enough, look here:

  • “ntoreturn” : 25,
  • “nscanned” : 791051,

 

Wow, that’s nasty… to return 25 records, we scanned 791,051! Gulp. Looks like a full table scan. Never a good thing (unless you have very small amounts of data).

Let’s see what sorts of indexes exist for the messages collection:

db.system.indexes.find( { ns: "production-alerts.messages" } );
{ "name" : "_id_", "key" : { "_id" : 1 }, "v" : 0 }
{ "v" : 1, "key" : { "created_at" : -1 }, "name" : "created_at_-1" }
{ "v" : 1, "key" : { "_type" : 1 }, "name" : "_type_1" }
{ "v" : 1, "key" : { "recv_app" : 1 }, "name" : "recv_app_1" }
{ "v" : 1, "key" : { "created_at" : -1, "recv_app" : 1 }, "name" : "created_at_-1_recv_app_1" }
{ "v" : 1, "key" : { "message_type" : 1 }, "name" : "message_type_1" }
{ "v" : 1, "key" : { "trigger_event" : 1 }, "name" : "trigger_event_1" }

Well, as expected, there is no index covering the multiple keys that we are searching on. So let’s add a multi-key index to match the query used by the controller!

db.messages.ensureIndex({_type:1, recv_app:1});

Now the app FLIES!! We dropped from 100+ seconds to 1.5 seconds (look at the “millis”) w00t!

db.messages.find({ _type : "HL7Message", recv_app : "CAREGIVER"}).explain();
> db.messages.find({ _type : "HL7Message", recv_app : "CAREGIVER"}).explain();
{
	"cursor" : "BtreeCursor _type_1_recv_app_1",
	"nscanned" : 791153,
	"nscannedObjects" : 791153,
	"n" : 791153,
	"millis" : 1546,
	"nYields" : 0,
	"nChunkSkips" : 0,
	"isMultiKey" : false,
	"indexOnly" : false,
	"indexBounds" : {
		"_type" : [
			[
				"HL7Message",
				"HL7Message"
			]
		],
		"recv_app" : [
			[
				"CAREGIVER",
				"CAREGIVER"
			]
		]
	}
}

To prevent this sort of thing, you can consider adding indexes when you create new queries. But the best way to do this is to be empirical and know whether you should add the index through some testing. I’ll leave that for another day!

MongoDB Index Performance

As part of this (unintended) mini-series on MongoDB and indexing, I had written a little test to see if I could document performance gains through indexing. I used realworld data, albeit only 50,000 records, to query out a handful or documents (24 being the most).

Related posts:

Here is the code:

require 'test_helper'

class EncounterListingTest < Test::Unit::TestCase

  context "Indexing" do
    ProfileStats2 = Struct.new(:doctor_num, :count, :timing1, :timing2, :timing3)

    should "profile assorted doctor patient retrievals" do
      stats = []
      doctor_nums = ["602490", "603324", "212043", "602938"]
      doctor_nums.each_with_index do |doctor_num, i|
        MongoMapper.database.collection('encounters').drop_indexes
        show_indexes if i == 0
        timing1 = (measure_performance(doctor_num) + measure_performance(doctor_num) + measure_performance(doctor_num))/3

        MongoMapper.database.collection('encounters').drop_indexes
        add_index([[:private_physician, 1]])
        show_indexes if i == 0
        timing2 = (measure_performance(doctor_num) + measure_performance(doctor_num) + measure_performance(doctor_num))/3

        MongoMapper.database.collection('encounters').drop_indexes
        add_index([[:private_physician,1], [:notify_physician,1], [:visible_count,1]])
        show_indexes if i == 0
        timing3 = (measure_performance(doctor_num) + measure_performance(doctor_num) + measure_performance(doctor_num))/3

        n_count = Encounter.count(:private_physician => doctor_num, :notify_physician => 'Y', :visible_count.gt => 0)
        stats << ProfileStats2.new(doctor_num, n_count, timing1, timing2, timing3)

      end

      File.open("test/performance/index_stats_results-#{Time.now.strftime("%d-%m-%Y")}.csv", 'w') do |f|
        puts "%10s  %6s  %5s  %5s  %5s" % ["doctor", "count", "None", "Phys", "Phys/Ntfy/Vis"]
        f.puts "doctor, count, None, Phys, PhysNtfyVis"
        stats.each do |s|
          results = "%10d, %6d, %5.3f, %5.3f, %5.3f" % [s.doctor_num, s.count, s.timing1, s.timing2, s.timing3]
          puts results
          f.puts "%d, %d, %5.3f, %5.3f, %5.3f" % [s.doctor_num, s.count, s.timing1, s.timing2, s.timing3]
        end
      end

    end

  end

  private
  def show_stats(stats)
    stats.each do |s|
      puts "%6d, %5.3f, %s" % [s.count, s.timing, s.index_type]
    end
  end

  def measure_performance(doctor_num = "99602326")
    start = Time.now
    n_public = Encounter.where(:private_physician => doctor_num, :notify_physician => 'Y', :visible_count.gt => 0).all
    delta = Time.now - start
    delta
  end

  def show_indexes
    puts "%s INDEXES %s" % ["*"*12, "*"*12]
    Encounter.collection.index_information.collect { |index| puts "    #{index[0]}" }
  end

  def add_index(new_index)
    coll = MongoMapper.database.collection('encounters')
    coll.drop_index(new_index) if !coll.index_information.detect { |index| index[0] == new_index }.nil?
    Encounter.ensure_index(new_index)
  end

end

Results:

The effect of adding indexes on query performance

The results are shown in the accompanying graph. Except for the query that returned 24 documents, the general trend was that 3 indexes were better than one. And one was w-a-a-a-y better than none (of course, you already knew that). The odd outlier being for count = 6, in that a single index did not perform as well as it did in all the other tests.