This particular application has been collecting data for months now, but hasn’t really had any users by design. At 33GB of data, pulling up a list of messages received was taking f-o-r-e-v-e-r!
So I decided to document how to go about and fix a running production system… Hope it helps.
Log into mongo console and turn on profiling (the ‘1’) to monitor slow queries. I entered >10 seconds, which really stinks (!). You should adjust it to suit your app’s definition of “slow”—maybe 500ms:
> db.setProfilingLevel(1,10000) { "was" : 0, "slowms" : 100, "ok" : 1 }
Next I went back to the webapp and executed the page request that exhibits the slow response …
Once the page returns, go in and look for any slow responses that the profiler logged:
> db.system.profile.find() { "ts" : ISODate("2012-02-18T15:34:02.967Z"), "op" : "command", "command" : { "count" : "messages", "query" : { "_type" : "HL7Message", "recv_app" : "CAREGIVER" }, "fields" : null }, "ntoreturn" : 1, "responseLength" : 48, "millis" : 119051, "client" : "192.168.100.67", "user" : "" } { "ts" : ISODate("2012-02-18T15:35:51.704Z"), "op" : "query", "query" : { "_type" : "HL7Message", "recv_app" : "CAREGIVER" }, "ntoreturn" : 25, "ntoskip" : 791025, "nscanned" : 791051, "nreturned" : 25, "responseLength" : 49956, "millis" : 108720, "client" : "192.168.100.67", "user" : "" }
You can see there was a count query and a query for the data itself (we are using pagination). Sure enough, look here:
- “ntoreturn” : 25,
- “nscanned” : 791051,
Wow, that’s nasty… to return 25 records, we scanned 791,051! Gulp. Looks like a full table scan. Never a good thing (unless you have very small amounts of data).
Let’s see what sorts of indexes exist for the messages
collection:
db.system.indexes.find( { ns: "production-alerts.messages" } ); { "name" : "_id_", "key" : { "_id" : 1 }, "v" : 0 } { "v" : 1, "key" : { "created_at" : -1 }, "name" : "created_at_-1" } { "v" : 1, "key" : { "_type" : 1 }, "name" : "_type_1" } { "v" : 1, "key" : { "recv_app" : 1 }, "name" : "recv_app_1" } { "v" : 1, "key" : { "created_at" : -1, "recv_app" : 1 }, "name" : "created_at_-1_recv_app_1" } { "v" : 1, "key" : { "message_type" : 1 }, "name" : "message_type_1" } { "v" : 1, "key" : { "trigger_event" : 1 }, "name" : "trigger_event_1" }
Well, as expected, there is no index covering the multiple keys that we are searching on. So let’s add a multi-key index to match the query used by the controller!
db.messages.ensureIndex({_type:1, recv_app:1});
Now the app FLIES!! We dropped from 100+ seconds to 1.5 seconds (look at the “millis”) w00t!
db.messages.find({ _type : "HL7Message", recv_app : "CAREGIVER"}).explain(); > db.messages.find({ _type : "HL7Message", recv_app : "CAREGIVER"}).explain(); { "cursor" : "BtreeCursor _type_1_recv_app_1", "nscanned" : 791153, "nscannedObjects" : 791153, "n" : 791153, "millis" : 1546, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "_type" : [ [ "HL7Message", "HL7Message" ] ], "recv_app" : [ [ "CAREGIVER", "CAREGIVER" ] ] } }
To prevent this sort of thing, you can consider adding indexes when you create new queries. But the best way to do this is to be empirical and know whether you should add the index through some testing. I’ll leave that for another day!