Category Archives: Database

Mongo Remembers All Keys

On the MongoMapper group list, Nick was wondering about getting key names from the model. But he noticed it remembered keys that had once been used… He wanted to only be able to see the current state of his MongoMapper class, I suppose… No dice, Nick!

Remember, MongoMapper Don’t Care! MongoMapper also does not forget! You can always see what keys were ever used as demonstrated here:

MongoMapper.database.collection('users').drop
class User
  include MongoMapper::Document

  key :name, String, :required => true
end
User.destroy_all
text = []
text << "After model with key :name, String"
text << User.keys.keys.inspect

text <<  'User.create(:name => "Fred")'
User.create(:name => "Fred")
text <<  User.keys.keys.inspect

text <<  'User.create(:name => "Fred", :email => "me@me.com")'
User.create(:name => "Fred", :email => "me@me.com")
text <<  User.keys.keys.inspect

text <<  'User.destroy_all'
User.destroy_all
text <<  User.keys.keys.inspect

text.each {|t| puts t}

You can see how the model keys reflect what is in the model class and in the actual document store (that is, dynamically added via a create):

After model with key :name, String
["name", "_id"]
User.create(:name => "Fred")
["name", "_id"]
User.create(:name => "Fred", :email => "me@me.com")
["name", "_id", "email"]
User.destroy_all
["name", "_id", "email"]

Now let’s extend the model class to add a new city key:

class User
  include MongoMapper::Document
  key :name, String, :required => true
  key :city, String
end
text = []
text <<  'Extended the class, adding city'
text <<  User.keys.keys.inspect
text.each {|t| puts t}

As expected: there is the new key:

Extended the class, adding city
["city", "name", "_id", "email"]

Removing Keys

If you accidentally added keys, then you should remove them. For example, I accidentally had an uppercase key in the model for a while (oops). Here is how I eradicated it from the database store:

  def self.purge_msid_key
    uppercase_msid_acts = Account.where(:MSID.exists => true).count
    if uppercase_msid_acts > 0
      Account.unset({}, :MSID)
    end
  end

Related MongoMapper Issue: Track Loaded Keys at the Instance Level

MongoDB Honey Badger

In case you don’t know about the Honey Badger—you have to watch this video. Then you will see why MongoDB is a close cousin to this feared and fearless animal!

Developing a new project where your domain classes/tables are changing rapidly?

MongoDB don’t care!

Tired of running rake db:migrate?

MongoDB don’t care!

Need to add a new “column” to your “table?”

MongoDB don’t care!

Want to query your “table” on “columns” that don’t exist?

MongoDB don’t care!

Need to add a new index on the fly?

MongoDB don’t care!

Welcome the Nastyass MongoDB into your development lair, you won’t give a shit about your database growing and changing!

MongoDB don’t care!

Find out more about Honey Badgers here — though Randall already taught us most of the salient points!

 

MongoMapper Query Overview

There was a question on the MongoMapper Google Group from a Mongoid user about how MongoMapper handles associations. Brandon was surprised that this query returned an Array:

Product.first.releases.where(something)

Let’s break it down, one bit at a time and clear things up:

# This would be an instance of Product
Product.first # Class.

This simply gets the first element in the Array that is returned by the default “All” query on Product. Of course, without sorting, you probably would not want to do this.

# This would be a return value of an array, assuming Product <>----> * Release
Product.first.releases # Array.

In Brandon’s example, I assume “releases” is a many association. That means, an Array. Unless the association has been tweaked to have default sorting via an Association Extension, getting the “first” one might be adventurous.

# This doesn't change the above... merely adds a restrictive query clause
Product.first.releases.where(something) # Array.

Here we simply get the first element of the releases array, narrowed down by the “something” query.

Capisce?

I am not sure why, but for me it seems more logical to start my clauses with the where, and narrow them down further, or modify them… In MongoMapper, I find querying rigor is much more “loose” than say a SQL SELECT query that requires things in proper order… I would tend to write my queries in more or less this fashion:

ModelClass.where(some criteria).[sort | order | another where clause | fields | limit].[all | first | paginate]

In addition, it is important to note that MongoMapper returns a query and does not actually perform the query until you add something that needs the results. For example: all, first, paginate, sort, etc.

I can picture one of those “man page” or SQL style of fancy ways to show you how you can construct a mongomapper query given all the combinations of options for each “position” in the query…

My (unsolicited) advice is to make the query look as “natural” as possible in terms of how you might read it aloud.

Product.releases.where(:major.gt => 1).sort(:minor.desc).first # Get the latest 1.x release

(And, if the releases where clause query is common, you can create an Association Extension)

Use the Console

You can always just output the queries to the console:

>> Patient.where(:last_name=>/john/i).class
=> Plucky::Query
>> Patient.where(:last_name=>/john/i).all.class
=> Array
>> Patient.where(:last_name=>/john/i).all.count
=> 1
>> Patient.where(:last_name=>/john/i).first.class
=> Patient
>> Patient.sort(:created_at.desc).first.class
=> Patient

Association Extension

And to show an example of an extension (when you use it frequently, for example):

class Encounter
  include MongoMapper::Document
  ...
  # Associations :::::::::::::::::::::::::::::::::::::::::::::::::::::
  many :events, :limit => 30, :order => 'msg_timestamp desc' do
    ...
    def images
      where(:type => [EventConstants::EventType.to_text(EventConstants::EventType::IMAGE)]).order(:created_at.desc).all
    end

    def charts
      where(:type => [EventConstants::EventType.to_text(EventConstants::EventType::ED_SUMMARY)],
            :file_version.in => ["P", "F"]).order(:created_at.desc).all
    end

    def admits
      all(:type => [EventConstants::EventType.to_text(EventConstants::EventType::ADMIT)])
    end
  end
  ...
end

# For a given encounter
enc=Encounter.find('4dadad188951a20727000160')
>> enc.events.images.count
=> 7
>> enc.events.images.class
=> Array
>> enc.events.images.first
=> #

Named Scope

If you will need dynamic querying, you could use a Named Scope as follows:

scope :by_days_old,  lambda { |age| where(:msg_timestamp.gt => age.days.ago) }

This can be used as follows:

Encounter.by_days_old(10)
=> #Fri Apr 15 03:35:53 UTC 2011}>

Factory Girl and MongoMapper

You were probably hoping for some Rosey the Riveter poster…

Factory Folder

Factory Folder

Instead, I am going to extend my small MongoMapper example to include Factory Girl. The steps are pretty simple:

  1. Go here to install…
  2. Create your factories
  3. Use the factories in Cucumber/RSpec

Factory Construction

I created a new “factories” folder under the spec folder:

The factories for User and Event are quite simple:

Factory.define :user do |u|
  u.name ('a'..'z').to_a.shuffle[0..7].join.capitalize
end

and

require 'factory_girl'
def dummy_word(len=6)
  ('a'..'z').to_a.shuffle[0..len].join.capitalize
end

def dummy_date
  secs_in_day = 24*60*60
  Time.now + (rand(60)*secs_in_day - 30)
end

Factory.define :event do |e|
  e.title "#{dummy_word} #{dummy_word 3} #{dummy_word 10}"
  e.date  dummy_date
end

Refactor Original Setup

Instead of using this style of test data creation:

@event = Event.create(:title => "Code Retreat Timbuktoo", :user => @fred)

We will use the new factory as follows:

@event = Factory(:event, :title => "Code Retreat Timbuktoo", :user => @fred)

Refactor Cucumber

The given went from this:

Given /^A set of events$/ do
  fred = User.find_or_create_by_name("fred")
  (1..10).each do
    Event.create(:title=>"#{dummy_word} #{dummy_word 3} #{dummy_word 10}",
                 :date => dummy_date,
                 :user => fred)
  end
  harry = User.find_or_create_by_name("harry")
  (1..10).each do
    Event.create(:title=>"#{dummy_word} #{dummy_word 3} #{dummy_word 10}",
                 :date => dummy_date,
                 :user => harry)
  end
  Event.count.should == 20
end

to this – including refactoring out dummy_title, and reducing it to one loop:

Given /^A set of events$/ do
  fred = User.find_or_create_by_name("fred")
  harry = User.find_or_create_by_name("harry")
  (1..10).each do
    evt = Factory(:event, :title => dummy_title,
                          :date  => dummy_date,
                          :user  => fred)
    evt = Factory(:event, :title => dummy_title,
                          :date  => dummy_date,
                          :user  => harry)
  end
  Event.count.should == 20
end

Subtle Details

The beauty of having tests is that I could easily mess around with getting some of the Factory Girl configuration stuff in the right place. Try something, run the test, adjust as needed until all are back to green.

The file features/support/env.rb got some additions so that Cucumber could find the factories:

$LOAD_PATH << File.expand_path('../../../app/model' , __FILE__)
require 'user'
require 'event'
require 'spec/factories/events.rb'
require 'spec/factories/users.rb'
load 'config/mongo_db.rb'

All the tests still pass!

More Complicated Example

For a project I work on, my factories look like this, with auto-creation of random IDs:

def random_months(months)
  day_in_secs = (24*60*60)
  (1+rand(months))*30*day_in_secs
end

# ----------- GROUP -----------
Factory.sequence :group_num do |n|
  "99#{n}#{rand(n)}"
end

Factory.define :group do |g|
    g.group_num {Factory.next(:group_num)}
    g.name "Greatest Group"
end
# ----------- ACCOUNT -----------
Factory.sequence :doctor_num do |n|
  "999992#{n}#{rand(200+n)}"
end

Factory.sequence :login do |n|
  "AB#{rand(n*68)}bx#{rand(200+n)}"
end

Factory.sequence :msid do |n|
  "CQ987Z12#{n}#{rand(n)}"
end

Factory.define :account do |a|
  pw = 'password'
  a.msid { Factory.next(:msid) }
  a.doctor_num { Factory.next(:doctor_num) }
  a.first_name "James"
  a.last_name "Jones"
  a.role 'user'
  a.password pw
  a.password_confirmation pw
  a.email Setting.get("AutoEmail")
  a.login { Factory.next(:login) }
end

# ----------- PATIENT -----------
Factory.sequence :patient_num do |n|
  "#{n}#{rand(300+n)}"
end

Factory.define :patient do |pt|
#  pt.patient_num "10000009"
  pt.patient_num {Factory.next(:patient_num)}
  pt.emr_num "1853286"
  pt.first_name "John"
  pt.last_name "Johnson"
  pt.dob {(Time.now - random_months(36))}
  pt.count_public_encounters 1
  pt.count_public_events 2
end

 

Multiple Many-to-Many Associations in MongoMapper

There was a question in the Google Group for MongoMapper, so I decided to post an answer in the form of a simple demo. You can find the source code on Github.com here.

The basic shape of the problem was this:

Users Sponsor and Attend Events

Users Sponsor and Attend Events

And the solution issues were around the multiple many-to-many associations, more or less.

For a simple one-to-many, MongoMapper has the normal:

  • User has many :events
  • Event belongs_to :user (its Owner)

But how to do the other associations? A given User can be involved with many events in different capacities:

  • Attending
  • Interested in attending
  • Likes

There are different ways to tackle these many-to-many associations.

  1. You can use a Set (to obtain the uniqueness factor) of Users that are attending or are interested.
  2. You can use an Array of instance IDs (I think this is probably the more standard technique)
class Event
  include MongoMapper::Document

  key :title, :required => true

  key :user_id
  belongs_to :user

  # One way to do it...
  key :attendees, Set
  key :interested, Set

  # Another way to do it...
  key :like_ids, Array
  many :likes, :class_name => 'User', :in => :like_ids

  def attending(a_user)
    # self.push_uniq(:attendees => a_user.id)
    attendees << a_user.id
    save
  end

  def interested_in(a_user)
    interested << a_user.id
    save
  end
...

In which direction you allow making the association, that is up to your application’s needs. For example, above you can see that an Event instance could be messaged with the user to indicate attending or interested_in. The “likes” is immediately accessible from an Event, or I could have added a wrapper method (def likes(a_user)).

And the User class has some simple retrieval methods to see what a User likes, what they are attending, and what they are interested_in:

class User
  include MongoMapper::Document

  key :name, :required => true

  many :events

  def likes
    Event.where(:like_ids => id).all
  end

  def attending
    Event.where(:attendees => id).all
  end

  def interested_in
    Event.where(:interested => id).all
  end

  def likes_event(event)
    event.likes << self
    event.save!
  end

To see how the different styles are used, you can check out the specs. For example:

Adding users who like an Event:

  it "should track interested" do
    expect {
      @event.interested_in(@jared)
      @event.interested_in(@sally)
    }.to change {@event.interested.size}.by(2)
  end

it "should allow 'likes'" do
    expect {
      @event_2.likes << @martha
    }.to change {@event_2.likes.size}.by(1)
  end

Or from the User perspective:

  it "should allow me to add an event I like" do
    @fred.likes_event(@event_2)
    @event_2.likes.size.should > 0
    @fred.likes.count.should > 0
  end

Or attendees, from the Event:

  it "should list the events I am attending" do
    [@fred, @harry].each {|u| @event.attending(u)}
    @fred.attending.count.should > 0
  end

MongoMapper vs MongoDB Cursor Stats

I just love developing with MongoMapper and MongoDB… This weekend I had an easy opportunity to test out the performance between iterating through a collection via MongoMapper or MongoDB cursor. (I had to fix up a field that I munged by screwing up some production code — oops)

My findings showed that the cursor approach was ~1.8x faster.

There’s probably some underlying “but of course” comment waiting to come out of John Nunemaker (creator of the amazing MongoMapper) or Kyle Banker (MongoDB expert).

Like, “but of course letting the database server manage the work is always better than returning a big hunk of documents!”

The two flavors of iteration look basically like this:

  • cursor = coll.find({:doctor_num => /^staff_id_numberd{6}/})
  • error_accounts = Account.all(:doctor_num => /^staff_id_numberd{6}/)

The findings were based on “correcting” 5,929 of the 9,002 total accounts.

Time Memory
Cursor 166 sec 104K
MM Array 293 sec 175K


The scientist in me says do a test across a larger number of accounts: 90K, 900K 9M — and see what the trend looks like for the cursor — I would expect pretty flat. The pragmatist says I got more important work to do on our V2 of the production app <g>.

From this little bit of data (see the second figure), it seems that the cursor’s lead in the speed department diminished with increasing record counts. However, the memory consumption stays pretty flat for the cursor approach. I’m sure that the array approach will run out of memory at some point when you try and process a lot of records — never a good thing. (Maybe I should do some research on our message log — ~400k per month.)

Compare Processing Speed and Memory Usage for Cursor and Array Approach

The code for the Cursor way is shown here, with the lines of interest highlighted:

def self.fix_errors_cursor_style
  coll = MongoMapper.database['accounts']
  error_accounts = coll.find({:doctor_num => /^staff_id_numberd{6}/})
  error_accounts.each do |rec|
    new_doctor_num = rec["doctor_num"].match(/(d{6})/).to_s
    accounts = Account.find_dupes(new_doctor_num)
    if accounts.size > 1
      Account.merge_accounts new_doctor_num
    else
      coll.update({"_id" => rec["_id"]}, {"$set" => {"doctor_num" => new_doctor_num}})
    end
  end
end

The code for the MongoMapper way looked like this:

def self.fix_errors_array_style
  error_accounts = Account.all(:doctor_num => /^staff_id_numberd{6}/)
  error_accounts.each do |a|
    new_doctor_num = a.doctor_num.match(/(d{6})/).to_s
    accounts = Account.find_dupes(new_doctor_num)
    if accounts.size > 1
      Account.merge_accounts new_doctor_num
    else
      a.update_attributes( :doctor_num => new_doctor_num )
      result = a.save
      if a.errors
        a.errors.each_pair {|k,e| puts ">>> #{k}: #{e}"}
      end
    end
  end
end

In case this counts for completeness of information presented… The rough numbers of the collection look like this:

  • “count”=>9002,
  • “size”=>5829364,
  • “avgObjSize”=>647.56,
  • “storageSize”=>13880064,
  • “numExtents”=>5,
  • “nindexes”=>4,
  • “lastExtentSize”=>10420224,
  • “paddingFactor”=>1.01,
  • “flags”=>1,
  • “totalIndexSize”=>1679360,
  • “indexSizes”=>{“_id_”=>385024, “login_1″=>352256, “msid_1″=>352256, “doctor_num_1″=>589824}, “ok”=>1.0}

Data Modeler Meets Objects

On the Agile Modeling forum, there was a great post by a Data Modeler requesting help on how to interpret an Object Model in terms of Data Modeling concepts.

My reply:

Some DM people “get it” and others remain stymied by the OM world 😉

Biggest issue is that the relationships in OO can often be “backwards” to many in the DM camp. Conversely, in SQL, you can usually get at any data that you want, regardless of the “direction”… A Client can have Address(es). Or, you can look at an address and probably tell me which clients are at that address.

You should endeavor to think in Objects. And, object models are often in the eye of the beholder. The perspective for a given model needs to be understood. That is, what did the modeler intend to convey?

Cutting to the chase (you get what you pay for here):

  • Classes
    • Class == Table
    • (non-transient) Attribute == Column
    • Normally i come up with standards for column width for

      various strings in the model

      • Names are varchar 125
      • URLs are varchar 416 (making this up )
      • Descriptions are Text
      • etc.
  • Associations
    • typically are whole-part or containment concepts. A Client——-0..* Address.
    • if you see a filled-in diamond, think “Cascade Delete”
    • other associations are simply relationships
  • Association Class
    • This is simply an unfinished portion of the model — or a short cut.
    • It means that there is more “information” to be stored than

      simply the pure association

    • For example: Insurance Agent ——– Agency can have an

      association class like “Registered” that shows the dates

      when the agent was active (employed) at the Agency.

    • You have to eventually implement the Association class as a

      class/table in between, and connect the classes with the

      proper cardinality.

  • Generalization
    • do not bother doing LDM for this (kind of like unary operators should be avoided)
    • parent has some attributes, children probably add more
    • you have a choice of the following:
      • one superset table with parent and child columns. for

        different child classes, you may have unfilled

        columns. a porous table so to speak.

      • use joins of child tables to parent tables. suffer the

        cost of a join

      • you can determine which is better based on projected

        usage and run some plans both ways with loaded sample

        data and representative queries and measure

        performance to make up your mind which way is best for

        your specific application

  • Implements
    • This is for an interface, which is purely behavior (methods)
    • You may be able to ignore completely
    • Or, if it is being used in an association (like a one class

      holds a collection of things implementing the interface),

      then you have to pay attention to it:

      • It can serve as a “reference” to many other classes

        that implement that Interface

      • You may have to persist that association and figure

        out how all the underlying implementing classes/tables

        can be held in a collection (typically a join table

        sort of a thing suffices)

Hope this helps!