Author Archives: jon

The Bizarro Manifesto

Let’s try a little Bizarro¹ test (if you agree to these, I’ll poke you with a hot krypton stick):

We are uncovering better ways to provide the illusion of developing software by listening to others talk about watching people try. Through this (dare I call it?) work, we have come to value:

Dogmatic process and CASE-tool-like automation over inspiring quality individuals to interact with the team and the clients
Sufficient up-front comprehensive design specifications over seeing frequent, tangible, working results.
Writing detailed Statements of Work and negotiating changes over collaborating to do our collective best with the time and money at hand
Driving toward the original project plan over accommodating the client changing their mind, or a path turning into a dead end

To elaborate:

We prefer to focus on building software with lock-step process and tools — and reduce our need to worry about quality individuals and having conversations amongst developers or with the client.
- that way we don’t need to worry about people issues and effective communication.
- That way we can hire any individual regardless of skill, and forgo all verbal/personal interactions in favor of solely written words. Even better if those written words are automatically transformed into code. Maybe we can get non-coder tools! After all, people are merely fungible assets/resources, and software is factory-work — with processes and tools, and a horde of low-paid serfs, we can crank it out!
We prefer to spend a lot of time up-front ensuring we have the requirements and design specs fully determined — rather than have tangible, working results early on.
- We start with complete requirements specifications (often 400 pages), that follow our company standard template.
- Even our Use Cases follow a mandatory 3-level deep path, with proper exception and alternate paths worked out.
- We link the requirements items into detailed design documents — which include system design diagrams, interface specifications, and detailed object models.
- If we don’t write it all down now, we’re likely to forget what we wanted. And if we don’t do it to the n-th degree, the developers might screw it up.
- Writing it all down up front allows us to go on vacation while the process and tools “write” the code from the detailed specs/diagrams. Sweet.
- In addition, we love to be rewarded by reaching meaningless intermediate deadlines that we place on our 1500-node Gantt chart.
- When we combine all of the upfront work with important deadlines, many of the senior managers can get promoted due to their great commitment to generating reams of cool-looking documents. By the time the sh!t hits the fan when folks realize the “ship it” deadline is missed, the senior managers are no longer involved.
- Besides, if we actually built software instead of writing all sorts of documents thinking about building software, our little ruse would be exposed!
We prefer to work under rigid Statements of Work — rather than actually work towards a “best-fit” solution given changing conditions of understanding and needs.
- The agreement is based on the 400-page, fully-specified requirements document, and we pad the cost estimate with a 400% profit margin.
- We then hire dozens of people to argue during the Change Control Review Board monthly meetings about re-writing code to deliver what you wanted versus what you asked for when you thought you knew what you wanted (and wrote it down in that 400-page doc that was signed off by 6 execs).
- Contract negotiation pissing matches are such great uses of our collective resources and always result in perfect software! We love our fine print 🙂
- With a 400% padding, the projects are too big to fail.
- Once we are in it for 1 or 2 million and 50% done and 2x schedule overrun, who would ever say “No” to a contract extension? Who better to get you to the goal line than the same folks who squandered away your treasure, pissed away the calendar, and delivered no working software yet?
- We like to appear like we’re just about done… Asymptote? Never heard of one.
We prefer to be driven by our initial plan — rather than dealing with change and having to re-print the Gantt.
- Especially a Gantt chart that has been built with tender loving care to include resource allocations, inter-project dependencies, and partial resource allocation assignments for matrix-style organizations.
- We love hiring a small army to ensure that we drive the entire team to meet every micro-task deadline even when they no longer make any sense.
- The real fun is collecting the “actuals” data from the developers assigned to each task so we can compare it to their estimated hours.
- And nothing sweeter than seeing 90% of our tasks being started, and 75% of those being 67% resolved; and 25% of the resolved actually being complete — the roll-up summary to management is such a compelling story of success.
- Changing such a beautiful plan that took 4 man-years to develop, that incorporates all of the comprehensive non-code documents, and is an appendix in the contract, is no small feat!
- Better to produce the software according to plan even if nobody wants it that way. That’s our motto, and we’re not going to change!
- We love the illusion of activity over the truth of delivered features.

Feel free to sign the manifesto below. It’s free to be certified.

Credit goes to Superman and Bizarro World.

Jira vs Post-It Notes

Leave a reply

Had some recent discussions on the subject of using tools or just sticky notes…

for me, a digital board representation of the kanban style 3-column view is merely one v-e-r-y small aspect of using something like jira.

though i could probably live with just post-it notes, i bet it would be an interesting challenge for me. i have been:

doing only distributed work for so long (since late 90s)
using jira so long, it is as simple as a pen and paper (or post-its)
dissatisfied trying to do a project with something simpler like Basecamp, without the richness of jira
working on long-term projects that have different people rolling through, and years of life (4000+ issues)

and i ask myself what other benefits do i derive from jira (plus, admittedly, a companion wiki)? why might i be uncomfortable with just post-its? do i have an unnecessary “crutch” in the form of jira? hmmm… do i do more than is necessary in the way of using jira/wiki to add other pertinent documentation for the project?

well, here are the following things i can think of off the top of my head:

it is easy to

assign myself to an issue
move it to be in progress
move it to be resolved

our virtual chats hover around the greenhopper view

we’ll edit stuff on the fly as needed
quickly create a new issue if something pops up during the call
everybody refreshes their browser

we put a fair amount of supporting docs — details, helpful things — into the issue (or into the wiki and then that link goes in the issue)

that way you always know where to look 🙂
you can find it 24×7 and regardless of your GeoLoc
sometimes we’ll have skype chats and voice calls about the issue, maybe about design ideas. I’ll shove those things into the issue.

we can link related issues
we can use the search to look up old issues and refresh ourselves on what was asked for/done
we can have asynchronous, threaded conversations in the form of comments
we can track any myriad of other stuff against the issue
i log my hours against the issue

useful for billing if you need it…

i can use it to help generate pretty charts for management

Uploaded with Skitch!

we use the issue status to trigger QA (in addition to our chats)
i use jira to help ensure product version notes are up to date

jira manages the iterations

pretty easy to maintain a backlog, even chunking it up into groups if needed
easy to indicate that an issue has been rejected, and why

the point is, i find it incredibly useful to have modern technology at my fingertips…

in my experience, there is so much more to a project tracking “tool” than what post-it notes would seem to represent.

[important]but i have NO (zero, nada, zilch) experience being part of a co-located team using post-it notes. so take it with a grain of salt :-)[/important]

Use Agile Wisely

Leave a reply

For the past few years, I have been bothered by a nagging urge to write a pamphlet on software titled “Common Sense” — an homage to Thomas Paine’s great work.

This may be a stretch, and I may counter this point once I actually try and research and draw more parallels — or not. So here I am just thinking out loud, as it were. A risky means to formulate my ideas, I know.

The Agile Manifesto is akin to the Declaration of Independence and US Constitution. (Not in its impact on the world, and I make the comparison in complete deference to the greatness that were our Founders.)

Agile is about freedom and individual responsibility grounded in an agreed-upon framework of overarching ethics and morals. The beauty of our Founding Fathers, was that they got to the essence of human behavior — including the depravity and moral weakness, and corruption by power, that we humans suffer from. They designed a system of government that accounts for such weaknesses, and that provides a means of feedback and correction. Correcting in the small (local and state government), and in the large (amendments to the document itself).

The Agile Manifesto is similarly poised. The four main tenets of the Manifesto are irrefutable, getting to the essence of software development. The Manifesto sets the stage up for great success without the shackles of a tyrannical, command and control form of management. It works from the bottom up.

There are many in the software world that decry the freedom of agile developers practicing their craft. Some organizations still drive a tyrannical structure within a framework of bureaucratic control. While many “Freedom Fighters” see the errors in these organizations, those on the inside are somehow oblivious, and probably believe in the superiority of command-and-control.

The lure of the bureaucratic layers to many managers always confounded me. Is it a lack of education about what freedom means? Is it a lack of courage to be individually accountable? Is it an allegiance to a known and friendly tyrannical structure — versus the unknown and scary individualism? Is it a belief that somehow a complex human organization can be broken down into its constituent parts, such that if each cog does their small bit, then the whole achieves its ultimate goal? It is, undoubtedly, simply part of human nature. Some folks are comfortable being risk takers and sticking their necks out, while others enjoy the comforts of a more controlled system within which to toil.

Over the past few years, our community has been inundated with “enlightened” Scrum Masters. The Church of Scrum has literally popped up overnight, anointing new converts at an alarming pace. With a relatively trivial-to-acquire certification, many organizations seek out said experts to be their savior on the road to riches.

Scrum itself is not the issue, after all it is Agile. But much like freedom in the hands of the uneducated often has disastrous results, so too does being Scrumified in the absence of understanding Agile.

Our young democratic republic worked hard to teach children about the US form of government and the hard-fought liberties we were blessed to enjoy. The children all studied from primers that helped them be educated about just “how” our form of government works. And they learned about the context… the “why” the framers of the Constitution chose their forms of checks and balances to stem undersirable side effects of human nature.

In the absence of much formal education about what the intent of the Agile Manifesto means, Scrum filled the void. That’s the beauty of a free market. Ideas can compete.

Much like our venerable representative democracy, Agile isn’t the most perfect form of s/w development, but it’s the best that has ever been. There are no guarantees that being “agile” will result in fabulous wealth and riches through the killer app. Just like there are no guarantees of outcome in a free society (at least there shouldn’t be any).

As with the United States of America, with great freedom lies great responsibility. Use agile wisely.

Uncle Bob Challenges The Architecture of a Rails App

Leave a reply

Uncle Bob has a very interesting keynote at the Ruby Midwest 2011 conference.

I developed apps with the fundamental architecture of the following (with dependencies only crossing one layer):

——
UI Layer
——
Business Object Layer
——
Data Mgt. Layer
——

Born from the one pattern that is king of the hill in my book: “Separation of Concerns.” The above was my architecture for all projects since the early 90s… C++, Java… but so far not so much in Rails.

I have thought about trying it, but not sure if it will pay off or not.

Essentially, it is about cleaving the rails model classes into two parts:

Business Methods, attributes, business rules
Persistence Methods, attributes, all knowledge of the DBMS details

In general, the UI deals with BOs, but sometimes we create dumb “Data Transfer Objects” that are lightweight versions of the business objects to be thrown about the system.

As a side note, in general, moving code to a more “object-oriented” state often ends up with the same lines of code. And often a bit more due to the boiler plate of creating additional classes.

In a current project, we have pulled out the business objects into a separate gem — but mostly because it needs to be used by our web app and by an eventmachine app.

The thing that shocked me the most about Rails, when Corey Haines introduced me to it in 2009, was that it was a lot like “Model Driven Architecture” that I had worked with for a few years. Given an architecture, a vertical slice thru the app, weave the model thru the architecture generator and out comes an application with a consistent architecture for the bulk of the app that is mostly the same (save for model/property names). Commercially, this MDA technology was a failure, last time I checked. Even though I thought it was the smartest way to develop apps, few others did. Except for Rails developers — largely because most rails devs probably have a very different mindset than other devs.

See blast from the past presentation here.

Though Bob pokes fun at Rails high-level directory structure as not revealing the business domain, I am totally fine with that. It’s a good thing. Yea, sure, it is revealing that it is an MVC style app designed to deliver web apps, so what? No matter which architecture is used, I look for the domain classes to tell me what the system is doing…

In my handful of rails apps to date, I have only used MongoDB and MongoMapper — and this is the closest I have gotten to the good old days of when I used the POET Object-oriented Database with C++ back in the late 90s. It is the closest I have been to nirvana. I basically *almost* don’t need to care that there even is a database…

One of these days, I’ll compare and contrast a Rails/MongoMapper app with and without Business Objects separated from Data Management classes.

MongoDB Group Map-Reduce Performance

Leave a reply

I wanted to get some aggregated count data by message type for a collection with over 1,000,000 documents.

The model is more or less this (a lot removed for simplicity):

class MessageLog
  include MongoMapper::Document

  # Attributes ::::::::::::::::::::::::::::::::::::::::::::::::::::::
  # Message's internal timestamp / when it was sent
  key :time, Time
  # Message type
  key :event_type, String, :default => "Not Set"
  # The message ID
  key :control_id, String, :index => true

  # Add created_at, updated_at
  timestamps!

  # Indexes :::::::::::::::::::::::::::::::::::::::::::::::::::::::::
  def self.create_indexes
    MessageLog.ensure_index([[:created_at,1]])
    MessageLog.ensure_index([[:event_type,1]])
  end
end

Distinct

Though I originally hard-coded the message types (as they do not change very often, and are meaningless without other code changes anyway), I figured I would test dynamically gathering the distinct types. MongoDb supports the distinct function. From the MongoDB console:

> db.message_logs.distinct("event_type")
[
	"Bed Order",
	"Cactus Update",
	"ED Release",
	"ED Summary",
	"Inpatient Admit",
	"Inpatient Discharge Summary",
	"Not Set",
	"Registration",
	"Registration Update",
	"Unknown Message Type"
]

Though I saw distinct in MongoMapper, I had trouble getting it to work (this is an older app on <v2.0, method missing error).

However, a very powerful technique within MongoMapper worked just perfect! Essentially, every collection in MongoMapper will return itself as a collection that MongoDB understands (in their db.collection.blah format) — helps when you need to execute MongoDB style commands:

class MessageLog
  # @return [Array] a list of unique types (strings)
  def self.event_types
    MessageLog.collection.distinct("event_type")
  end
end

Simple Count

I used a simple technique to iterate over each type and get the associated count:

class MessageLog
  # Perform a group aggregation by event type.
  #
  # @return [Hash] the number of message logs per event type.
  def self.count_by_type
    results = {}
    MessageLog.event_types.each {|type| results[type] = MessageLog.count(:event_type => type)}
    results
  end
end

Map-Reduce Too Slow

In this instance, it turned out that Map-Reduce was significantly slower, and I am not exactly sure why. Other than I suppose that iterating over each document is more expensive than calling count with a filter on the event_type key (which is covered by an index).

class MessageLog
  # Perform a group aggregation by event type.
  # Map-Reduce was slow by comparison (20 seconds vs 2.3 seconds)
  #
  # @return [Hash] the number of message logs per event type.
  def self.count_by_type_mr
    results = {}
    counts = MessageLog.collection.group( {:key => :event_type, :cond => {}, :reduce => 'function(doc,prev) { prev.count += 1; }', :initial => {:count => 0} })
    counts.each {|r| results[r["event_type"]] = r["count"]}
    results
  end
end

Performance Results

As you can see, Map-Reduce took about [notice]10 times longer,[/notice] ~21 seconds versus ~2.3 seconds.

And this is over 1,129,519 documents, so it is a non-trivial test, IMO.

> measure_mr = Benchmark.measure("count") { results = MessageLog.count_by_type_mr}
> measure = Benchmark.measure("count") { results = MessageLog.count_by_type }
ruby-1.8.7-p334 :010 > puts measure_mr
  0.000000   0.000000   0.000000 ( 20.794720)
> puts measure
  0.020000   0.000000   0.020000 (  2.340708)
> results.map {|k,v| puts "#{k} #{v}"}
Not Set                          1
Inpatient Admit              4,493
Unknown Message Type         1,292
Bed Order                    6,948
Registration Update        852,189
Registration               123,064
ED Summary                  94,933
Cactus Update               10,145
Inpatient Discharge Summary 18,150
ED Release                  18,304

Summary

You may get better performance using simpler techniques for simple aggregate commands. And maybe Map-Reduce shines better on more complex computations/queries.

[important]But your best bet is to test it out with meaningful data samples.[/important]

Hiring a Team Member

Leave a reply

I read this very good post “The Number One Trait of a Great Developer” by Tammer Saleh at Engine Yard, and it made me think…

I used to rant about “It’s the Business, Stupid” in conferences, implying that we should be solving problems for our clients, not simply playing with the next shiny toy. Sure, I love shiny toys, and I love to play with them — especially when they make sense within the context of solving a problem. But your scenario is an oft-spotted pattern where folks lack the engineering skills required to build right-sized solutions to meet the here-and-now needs.

And of course, anyone can come up with a complex solution that sprawls across multiple cubicle walls on e-size plotter paper (that’s easy). Only a rare few can come up with the minimalist solution that meets the needs of the business, and can easily grow over time.

As far as hiring, I like to look for the “engineering” mind. After all, engineers put man on the moon, not scientists.

MongoDB Index Performance

Leave a reply

As part of this (unintended) mini-series on MongoDB and indexing, I had written a little test to see if I could document performance gains through indexing. I used realworld data, albeit only 50,000 records, to query out a handful or documents (24 being the most).

Here is the code:

require 'test_helper'

class EncounterListingTest < Test::Unit::TestCase

  context "Indexing" do
    ProfileStats2 = Struct.new(:doctor_num, :count, :timing1, :timing2, :timing3)

    should "profile assorted doctor patient retrievals" do
      stats = []
      doctor_nums = ["602490", "603324", "212043", "602938"]
      doctor_nums.each_with_index do |doctor_num, i|
        MongoMapper.database.collection('encounters').drop_indexes
        show_indexes if i == 0
        timing1 = (measure_performance(doctor_num) + measure_performance(doctor_num) + measure_performance(doctor_num))/3

        MongoMapper.database.collection('encounters').drop_indexes
        add_index([[:private_physician, 1]])
        show_indexes if i == 0
        timing2 = (measure_performance(doctor_num) + measure_performance(doctor_num) + measure_performance(doctor_num))/3

        MongoMapper.database.collection('encounters').drop_indexes
        add_index([[:private_physician,1], [:notify_physician,1], [:visible_count,1]])
        show_indexes if i == 0
        timing3 = (measure_performance(doctor_num) + measure_performance(doctor_num) + measure_performance(doctor_num))/3

        n_count = Encounter.count(:private_physician => doctor_num, :notify_physician => 'Y', :visible_count.gt => 0)
        stats << ProfileStats2.new(doctor_num, n_count, timing1, timing2, timing3)

      end

      File.open("test/performance/index_stats_results-#{Time.now.strftime("%d-%m-%Y")}.csv", 'w') do |f|
        puts "%10s  %6s  %5s  %5s  %5s" % ["doctor", "count", "None", "Phys", "Phys/Ntfy/Vis"]
        f.puts "doctor, count, None, Phys, PhysNtfyVis"
        stats.each do |s|
          results = "%10d, %6d, %5.3f, %5.3f, %5.3f" % [s.doctor_num, s.count, s.timing1, s.timing2, s.timing3]
          puts results
          f.puts "%d, %d, %5.3f, %5.3f, %5.3f" % [s.doctor_num, s.count, s.timing1, s.timing2, s.timing3]
        end
      end

    end

  end

  private
  def show_stats(stats)
    stats.each do |s|
      puts "%6d, %5.3f, %s" % [s.count, s.timing, s.index_type]
    end
  end

  def measure_performance(doctor_num = "99602326")
    start = Time.now
    n_public = Encounter.where(:private_physician => doctor_num, :notify_physician => 'Y', :visible_count.gt => 0).all
    delta = Time.now - start
    delta
  end

  def show_indexes
    puts "%s INDEXES %s" % ["*"*12, "*"*12]
    Encounter.collection.index_information.collect { |index| puts "    #{index[0]}" }
  end

  def add_index(new_index)
    coll = MongoMapper.database.collection('encounters')
    coll.drop_index(new_index) if !coll.index_information.detect { |index| index[0] == new_index }.nil?
    Encounter.ensure_index(new_index)
  end

end

Results:

The effect of adding indexes on query performance

The results are shown in the accompanying graph. Except for the query that returned 24 documents, the general trend was that 3 indexes were better than one. And one was w-a-a-a-y better than none (of course, you already knew that). The odd outlier being for count = 6, in that a single index did not perform as well as it did in all the other tests.

A Walk Through the Valley of Indexing in MongoDB

1 Reply

As you walk through the valley of MongoDB performance, you will undoubtedly find yourself wanting to optimize your indexes at some point or other.

How to Watch Your Queries

Run your database with profiling on. I have an alias for starting up mongo in profile mode (‘p’ stands for profile):

alias mongop="<mongodb-install>/bin/mongod
      --smallfiles --noprealloc --profile=1 --dbpath <mongodb-install>/data/db"

This will default to considering queries > 100ms being deemed “slow.”

Add a logger (if you are using MongoMapper) and tail the log file to see the queries.

##### MONGODB SETTINGS #####
# You can use :logger => Rails.logger, to get output of the mongo queries, or create a separate logger.
logger = Logger.new('mongo-development.log')
MongoMapper.connection = Mongo::Connection.new('localhost', 27017, {:pool_size => 5, :auto_reconnect => true, :logger => logger})
MongoMapper.database = "mdalert-development"

Run your query/exercise the app.

Examine the mongo log (trimmed for legibility), and look for the primary collection you were querying (bolded below).

['$cmd'].find(#"settings", "query"=>{:identifier=>"ItemsPerPage"}, "fields"=>nil}>).limit(-1)
['$cmd'].find(#"accounts", "query"=>{:state=>"active"}, "fields"=>nil}>).limit(-1)
['accounts'].find({:state=>"active"}).limit(15).sort(email1)
['settings'].find({:identifier=>"AutoEmail"}).limit(-1)

Use MongoDB’s Explain to dig deeper.

Open up the mongo shell (<mongodb-install>/bin/mongo) and enter the query that you want explained. Hint, you can take much of it from the query in the log.

Without any indexes, you can see the query is scanning the entire table basically. A bad thing! Another tip is the cursor type is “BasicCursor.”

> db.accounts.find({state: "active"}).limit(15).sort({email: 1}).explain();
{
  "cursor" : "BasicCursor",
  "nscanned" : 11002,
  "nscannedObjects" : 11002,
  "n" : 15,
  "scanAndOrder" : true,
  "millis" : 44,
  "nYields" : 0,
  "nChunkSkips" : 0,
  "isMultiKey" : false,
  "indexOnly" : false,
  "indexBounds" : {
  }
}

Since I was doing a find on state, and a sort on email (or last_name), I added a compound index using MongoMapper (you could have just as easily done it at the mongo console).

Account.ensure_index([[:state,1],[:email,1]])
Account.ensure_index([[:state,1],[:last_name,1]])

Re-running the explain, you can see

the cursor type is now BtreeCursor (i.e., using an index)
the entire table is not scanned.
The retrieval went from 44 millis down to 2 millis
Success!!

> db.accounts.find({state: "active"}).limit(15).sort({email: 1}).explain();
{
  "cursor" : "BtreeCursor state_1_email_1",
  "nscanned" : 15,
  "nscannedObjects" : 15,
  "n" : 15,
  "millis" : 2,
  "nYields" : 0,
  "nChunkSkips" : 0,
  "isMultiKey" : false,
  "indexOnly" : false,
  "indexBounds" : {
    "state" : [ [ "active", "active" ] ],
    "email" : [ [ {"$minElement" : 1}, {"$maxElement" : 1} ] ]
  }
}

Fiddling a Bit More – Using the Profiler

You can drop into the mongo console and see more specifics using the mongo profiler.

> db.setProfilingLevel(1,15)
{ "was" : 1, "slowms" : 100, "ok" : 1 }

For this example, I cleared the indexes on accounts. and I ran the following query, and examined its profile data.
Note: the timing can vary over successive runs, but it generally is fairly consistent — and it is close to the “millis” value you see in explain output.

> db.accounts.find({state: "active"}).limit(15).sort({email: 1})
> db.system.profile.find();
{ "ts" : ISODate("2011-11-27T21:09:04.237Z"),
  "info" : "query mdalert-development.accounts
  ntoreturn:15 scanAndOrder
  reslen:8690
  nscanned:11002
  query: { query: { state: "active" },
    orderby: { email: 1.0 } }
    nreturned:15 163ms", "millis" : 43 }

Now let’s add back the indexes… one at a time. First up, let’s add “state.”

> db.accounts.ensureIndex({state:1})
>db.accounts.find({state: "active"}).limit(15).sort({email: 1})
db.system.profile.find({info: /.accounts/})
{ "ts" : ISODate("2011-11-27T21:26:29.801Z"),
  "info" : "query mdalert-development.accounts
  ntoreturn:15 scanAndOrder
  reslen:546
  nscanned:9747
  query: { query: { state: "active" },
    orderby: { email: 1.0 }, $explain: true }
    nreturned:1 81ms", "millis" : 81 }

Hmmm. Not so good! Let’s add in the compound index that we know we need, and run an explain:

>db.accounts.ensureIndex({state:1,email:1})
> db.accounts.find({state: "active"}).limit(15).sort({email: 1}).explain()
{
	"cursor" : "BtreeCursor state_1_email_1",
	"nscanned" : 15,
	"nscannedObjects" : 15,
	"n" : 15,
	"millis" : 0,

And sure enough, we get good performance. The millis is so small, that this query will not show up in the profiler.

If you want to clear the profile stats, you’ll soon find out you can’t remove the documents. The only way I saw how to do it was as follows:

restart mongod in non-profiling mode
reopen the mongo console and type:
```
db.system.profile.drop()
```

You should now see the profile being empty:

> show profile
db.system.profile is empty

Now you can restart mongod in profiling mode and see your latest profiling data without all the ancient history.

Some Gotchas

Regex searches cannot be indexed

If your query is a regex, then the index can’t help. With regex, retrieval is 501 ms (not bad, given 317K records):

> db.message_logs.find({patient_name:/ben franklin/i}).sort({created_at:-1}).explain()
{
	"cursor" : "BtreeCursor patient_name_1 multi",
	"nscanned" : 317265,
	"nscannedObjects" : 27,
	"n" : 27,
	"millis" : 501,
	"nYields" : 0,
	"nChunkSkips" : 0,
	"isMultiKey" : false,
	"indexOnly" : false,
	"indexBounds" : {
		"patient_name" : [ [ "", { } ], [ /ben franklin/, /ben franklin/ ] ]
	}
}

Without regex, it is essentially instantaneous:

> db.message_logs.find({patient_name:'Ben Franklin'}).sort({created_at:-1}).explain()
{
	"cursor" : "BtreeCursor patient_name_1",
	"nscanned" : 27,
	"nscannedObjects" : 27,
	"n" : 27,
	"scanAndOrder" : true,
	"millis" : 0,
	"nYields" : 0,
	"nChunkSkips" : 0,
	"isMultiKey" : false,
	"indexOnly" : false,
	"indexBounds" : {
		"patient_name" : [ [ "Ben Franklin", "Ben Franklin" ] ]
	}
}

Tips for examining profiler output

Look at the most recent offenders:
show profile
Look at a single collection:
db.system.profile.find({info: /message_logs/})
Look at a slow queries:
db.system.profile.find({millis : {$gt : 500}})
Look at a single collection with a specific query param, and a response >100ms:
db.system.profile.find({info: /message_logs/, info: /patient_name/, millis : {$gt : 100}})

Summary

So here you have an example of how to see indexes in action, how to create them, and how to measure their effects.

References:

Kyle Banker has a great post that puts MongoDB indexes into perspective: “The Joy of Indexing.”
And Richard Kreuter has a nice presentation on “Indexing and Query Optimizer.”

Configuring MongoMapper Indexes in Rails App

1 Reply

Not quite sure where the best place is to define MongoDB indexes via MongoMapper in a Rails app… My progression has been:

as part of the key definition in the model class
in a rails initializer
hybrid between initializer and model
rake task invoking model methods

Define Indexes on the Keys

This works fine during development.

class Account
  include MongoMapper::Document
  ...
  # Attributes ::::::::::::::::::::::::::::::::::::::::::::::::::::::
  key :login, String, :unique => true, :index => true
  key :msid, String, :index => true
  key :doctor_num, String, :index => true
  ...
end

Define Indexes in an Initializer

When I wanted to trigger a new index creation, I would add it here. Only problem is that restarting a production server with tons of data gets held up by the create index task.

# Rails.root/config/initializers/mongo_config.rb
Account.ensure_index(:last_name)
Group.ensure_index(:name)
Group.ensure_index(:group_num)
...

Define Indexes in a Class Method, Invoke in Initializer

A small tweak to putting indexes into an initializer was to place the knowledge of the indexes back into the model classes themselves. Then, all you needed to do was invoke the model class method to create it’s own indexes.

The Initializer Code

 
# Rails.root/config/initializers/mongo_config.rb
Event.create_indexes
Encounter.create_indexes
Setting.create_indexes

The Model(s) Code

 
class Setting
  include MongoMapper::Document
  # Attributes ::::::::::::::::::::::::::::::::::::::::::::::::::::::
  # What the user sees as a label
  key :label, String
  # How we reference it in code
  key :identifier, String, :required => true
  ...
  # Indexes :::::::::::::::::::::::::::::::::::::::::::::::::::::::::
  def self.create_indexes
    self.ensure_index(:identifier, :unique => true)
    self.ensure_index(:label, :unique => true)
  end
  ...
end

Enter the Rake!

Of course, you could also invoke the index creation code in a rake task, as pointed out here.

The beauty behind a rake task as best I can tell is this:

You can run it at any time to update the indexes
You do not bring a deploy to a screeching halt because you are waiting for index creation

I was already standardizing on how I was creating indexes inside each model class — where better to keep on top of what the indexes for a class should be than in the class itself!

# app/models/setting.rb
class Setting
  ...
  def self.create_indexes
    self.ensure_index(:identifier, :unique => true)
    self.ensure_index(:label, :unique => true)
  end
  ...
end

I created a new class in the model directory (so that it is close to where the models are defined) that simply loops through each model class to generate the proper indexes:

# app/models/create_indexes.rb
class CreateIndexes
  def self.all
    puts "*"*15 + " GENERATING INDEXES" + "*"*15
    MongoMapper.database.collection_names.each do |coll|
      # Avoid "system.indexes"
      next if coll.index(".")

      model = coll.singularize.camelize.constantize
      model.create_indexes if model.respond_to?(:create_indexes)
      model.show_indexes if model.respond_to?(:show_indexes)
    end
  end
end

You can invoke it easily from the Rails console: CreateIndexes.all

Next I created a rake task (in lib/tasks/indexes.rake) that invoked the ruby code to do the indexing mojo.

namespace :db do
  namespace :mongo do
    desc "Create mongo_mapper indexes"
    task :index => :environment do
      CreateIndexes.all
    end
  end
end

Any tips/comments/insights appreciated…

PS: self.show_indexes Mix-in

I created a mix-in for the “show_indexes()” class method for each model. I could not add it directly to the MongoMapper::Document class unfortunately — I ran into errors and finally gave up. Here’s the mix-in that I defined in lib/mongo_utils.rb:

module MongoMapper
  module IndexUtils
    puts "Customizing #{self.inspect}"
    module ClassMethods
      def show_indexes
        puts "%s #{self.name} INDEXES %s" % ["*"*12, "*"*12]
        self.collection.index_information.collect do |index|
          puts "    #{index[0]}#{index[1].has_key?("unique") ? " (unique)":"x"}"
        end
      end
    end
    def self.included(base)
      #puts "#{base} is being extended'"
      base.extend(ClassMethods)
    end
  end
end

And you use it as follows:

require 'mongo_mapper'
require 'mongo_utils'
class Setting
  include MongoMapper::Document
  include MongoMapper::IndexUtils
  ...

Design Debates

Leave a reply

many times there are two (or more) seemingly viable approaches that people can be arguing for…

when in doubt, sketch it out… that is,

quick model diagrams, or
quick sequence diagrams,
compare and contrast

now let’s (hopefully continue to) presume this is about an exceedingly critical aspect, a make-or-break design decision. because surely, you aren’t spending precious project resources deciding whether or not to use 2 or 4 spaces per indent level!

so, if sketching out the ideas and comparing still did not reveal a clear winner, then code up the competing ideas and put them to the test. give the designs a day or two of effort, or more — in proportion to the critical nature of getting the decision right. then, have some a priori metrics by which to pick the winner via “testing” the designs.

better performance
less code
easier to grok
suitability to task
etc

and if you still can’t choose a clear winner, well, man-up, be a leader, and make a freaking decision (even if it is flipping a coin), and don’t look back.

Technical Debt

Jon Kern's ramblings on software development

Author Archives: jon

The Bizarro Manifesto

Jira vs Post-It Notes

Use Agile Wisely

Uncle Bob Challenges The Architecture of a Rails App

MongoDB Group Map-Reduce Performance

Distinct

Simple Count

Map-Reduce Too Slow

Performance Results

Summary

Hiring a Team Member

MongoDB Index Performance

Results:

A Walk Through the Valley of Indexing in MongoDB

How to Watch Your Queries

Run your query/exercise the app.

Use MongoDB’s Explain to dig deeper.

Fiddling a Bit More – Using the Profiler

Some Gotchas

Regex searches cannot be indexed

Tips for examining profiler output

Summary

References:

Configuring MongoMapper Indexes in Rails App

Define Indexes on the Keys

Define Indexes in an Initializer

Define Indexes in a Class Method, Invoke in Initializer

The Initializer Code

The Model(s) Code

Enter the Rake!

PS: self.show_indexes Mix-in

Design Debates