Category Archives: Database

Mongo Remembers All Keys

On the MongoMapper group list, Nick was wondering about getting key names from the model. But he noticed it remembered keys that had once been used… He wanted to only be able to see the current state of his MongoMapper class, I suppose… No dice, Nick!

Remember, MongoMapper Don’t Care! MongoMapper also does not forget! You can always see what keys were ever used as demonstrated here:

MongoMapper.database.collection('users').drop
class User
  include MongoMapper::Document

  key :name, String, :required => true
end
User.destroy_all
text = []
text << "After model with key :name, String"
text << User.keys.keys.inspect

text <<  'User.create(:name => "Fred")'
User.create(:name => "Fred")
text <<  User.keys.keys.inspect

text <<  'User.create(:name => "Fred", :email => "me@me.com")'
User.create(:name => "Fred", :email => "me@me.com")
text <<  User.keys.keys.inspect

text <<  'User.destroy_all'
User.destroy_all
text <<  User.keys.keys.inspect

text.each {|t| puts t}

You can see how the model keys reflect what is in the model class and in the actual document store (that is, dynamically added via a create):

After model with key :name, String
["name", "_id"]
User.create(:name => "Fred")
["name", "_id"]
User.create(:name => "Fred", :email => "me@me.com")
["name", "_id", "email"]
User.destroy_all
["name", "_id", "email"]

Now let’s extend the model class to add a new city key:

class User
  include MongoMapper::Document
  key :name, String, :required => true
  key :city, String
end
text = []
text <<  'Extended the class, adding city'
text <<  User.keys.keys.inspect
text.each {|t| puts t}

As expected: there is the new key:

Extended the class, adding city
["city", "name", "_id", "email"]

Removing Keys

If you accidentally added keys, then you should remove them. For example, I accidentally had an uppercase key in the model for a while (oops). Here is how I eradicated it from the database store:

  def self.purge_msid_key
    uppercase_msid_acts = Account.where(:MSID.exists => true).count
    if uppercase_msid_acts > 0
      Account.unset({}, :MSID)
    end
  end

Related MongoMapper Issue: Track Loaded Keys at the Instance Level

MongoDB Honey Badger

2 Replies

In case you don’t know about the Honey Badger—you have to watch this video. Then you will see why MongoDB is a close cousin to this feared and fearless animal!

Developing a new project where your domain classes/tables are changing rapidly?

MongoDB don’t care!

Tired of running rake db:migrate?

MongoDB don’t care!

Need to add a new “column” to your “table?”

MongoDB don’t care!

Want to query your “table” on “columns” that don’t exist?

MongoDB don’t care!

Need to add a new index on the fly?

MongoDB don’t care!

Welcome the Nastyass MongoDB into your development lair, you won’t give a shit about your database growing and changing!

MongoDB don’t care!

Find out more about Honey Badgers here — though Randall already taught us most of the salient points!

MongoMapper Query Overview

Use the Console

You can always just output the queries to the console:

>> Patient.where(:last_name=>/john/i).class
=> Plucky::Query
>> Patient.where(:last_name=>/john/i).all.class
=> Array
>> Patient.where(:last_name=>/john/i).all.count
=> 1
>> Patient.where(:last_name=>/john/i).first.class
=> Patient
>> Patient.sort(:created_at.desc).first.class
=> Patient

Association Extension

And to show an example of an extension (when you use it frequently, for example):

class Encounter
  include MongoMapper::Document
  ...
  # Associations :::::::::::::::::::::::::::::::::::::::::::::::::::::
  many :events, :limit => 30, :order => 'msg_timestamp desc' do
    ...
    def images
      where(:type => [EventConstants::EventType.to_text(EventConstants::EventType::IMAGE)]).order(:created_at.desc).all
    end

    def charts
      where(:type => [EventConstants::EventType.to_text(EventConstants::EventType::ED_SUMMARY)],
            :file_version.in => ["P", "F"]).order(:created_at.desc).all
    end

    def admits
      all(:type => [EventConstants::EventType.to_text(EventConstants::EventType::ADMIT)])
    end
  end
  ...
end

# For a given encounter
enc=Encounter.find('4dadad188951a20727000160')
>> enc.events.images.count
=> 7
>> enc.events.images.class
=> Array
>> enc.events.images.first
=> #

Named Scope

If you will need dynamic querying, you could use a Named Scope as follows:

scope :by_days_old,  lambda { |age| where(:msg_timestamp.gt => age.days.ago) }

This can be used as follows:

Encounter.by_days_old(10)
=> #Fri Apr 15 03:35:53 UTC 2011}>

Factory Girl and MongoMapper

Factory Construction

I created a new “factories” folder under the spec folder:

The factories for User and Event are quite simple:

Factory.define :user do |u|
  u.name ('a'..'z').to_a.shuffle[0..7].join.capitalize
end

and

require 'factory_girl'
def dummy_word(len=6)
  ('a'..'z').to_a.shuffle[0..len].join.capitalize
end

def dummy_date
  secs_in_day = 24*60*60
  Time.now + (rand(60)*secs_in_day - 30)
end

Factory.define :event do |e|
  e.title "#{dummy_word} #{dummy_word 3} #{dummy_word 10}"
  e.date  dummy_date
end

Refactor Original Setup

Instead of using this style of test data creation:

@event = Event.create(:title => "Code Retreat Timbuktoo", :user => @fred)

We will use the new factory as follows:

@event = Factory(:event, :title => "Code Retreat Timbuktoo", :user => @fred)

Refactor Cucumber

The given went from this:

Given /^A set of events$/ do
  fred = User.find_or_create_by_name("fred")
  (1..10).each do
    Event.create(:title=>"#{dummy_word} #{dummy_word 3} #{dummy_word 10}",
                 :date => dummy_date,
                 :user => fred)
  end
  harry = User.find_or_create_by_name("harry")
  (1..10).each do
    Event.create(:title=>"#{dummy_word} #{dummy_word 3} #{dummy_word 10}",
                 :date => dummy_date,
                 :user => harry)
  end
  Event.count.should == 20
end

to this – including refactoring out dummy_title, and reducing it to one loop:

Given /^A set of events$/ do
  fred = User.find_or_create_by_name("fred")
  harry = User.find_or_create_by_name("harry")
  (1..10).each do
    evt = Factory(:event, :title => dummy_title,
                          :date  => dummy_date,
                          :user  => fred)
    evt = Factory(:event, :title => dummy_title,
                          :date  => dummy_date,
                          :user  => harry)
  end
  Event.count.should == 20
end

Subtle Details

The beauty of having tests is that I could easily mess around with getting some of the Factory Girl configuration stuff in the right place. Try something, run the test, adjust as needed until all are back to green.

The file features/support/env.rb got some additions so that Cucumber could find the factories:

$LOAD_PATH << File.expand_path('../../../app/model' , __FILE__)
require 'user'
require 'event'
require 'spec/factories/events.rb'
require 'spec/factories/users.rb'
load 'config/mongo_db.rb'

All the tests still pass!

More Complicated Example

For a project I work on, my factories look like this, with auto-creation of random IDs:

def random_months(months)
  day_in_secs = (24*60*60)
  (1+rand(months))*30*day_in_secs
end

# ----------- GROUP -----------
Factory.sequence :group_num do |n|
  "99#{n}#{rand(n)}"
end

Factory.define :group do |g|
    g.group_num {Factory.next(:group_num)}
    g.name "Greatest Group"
end
# ----------- ACCOUNT -----------
Factory.sequence :doctor_num do |n|
  "999992#{n}#{rand(200+n)}"
end

Factory.sequence :login do |n|
  "AB#{rand(n*68)}bx#{rand(200+n)}"
end

Factory.sequence :msid do |n|
  "CQ987Z12#{n}#{rand(n)}"
end

Factory.define :account do |a|
  pw = 'password'
  a.msid { Factory.next(:msid) }
  a.doctor_num { Factory.next(:doctor_num) }
  a.first_name "James"
  a.last_name "Jones"
  a.role 'user'
  a.password pw
  a.password_confirmation pw
  a.email Setting.get("AutoEmail")
  a.login { Factory.next(:login) }
end

# ----------- PATIENT -----------
Factory.sequence :patient_num do |n|
  "#{n}#{rand(300+n)}"
end

Factory.define :patient do |pt|
#  pt.patient_num "10000009"
  pt.patient_num {Factory.next(:patient_num)}
  pt.emr_num "1853286"
  pt.first_name "John"
  pt.last_name "Johnson"
  pt.dob {(Time.now - random_months(36))}
  pt.count_public_encounters 1
  pt.count_public_events 2
end

Multiple Many-to-Many Associations in MongoMapper

4 Replies

There was a question in the Google Group for MongoMapper, so I decided to post an answer in the form of a simple demo. You can find the source code on Github.com here.

The basic shape of the problem was this:

Users Sponsor and Attend Events

And the solution issues were around the multiple many-to-many associations, more or less.

For a simple one-to-many, MongoMapper has the normal:

User has many :events
Event belongs_to :user (its Owner)

But how to do the other associations? A given User can be involved with many events in different capacities:

Attending
Interested in attending
Likes

There are different ways to tackle these many-to-many associations.

You can use a Set (to obtain the uniqueness factor) of Users that are attending or are interested.
You can use an Array of instance IDs (I think this is probably the more standard technique)

class Event
  include MongoMapper::Document

  key :title, :required => true

  key :user_id
  belongs_to :user

  # One way to do it...
  key :attendees, Set
  key :interested, Set

  # Another way to do it...
  key :like_ids, Array
  many :likes, :class_name => 'User', :in => :like_ids

  def attending(a_user)
    # self.push_uniq(:attendees => a_user.id)
    attendees << a_user.id
    save
  end

  def interested_in(a_user)
    interested << a_user.id
    save
  end
...

In which direction you allow making the association, that is up to your application’s needs. For example, above you can see that an Event instance could be messaged with the user to indicate attending or interested_in. The “likes” is immediately accessible from an Event, or I could have added a wrapper method (def likes(a_user)).

And the User class has some simple retrieval methods to see what a User likes, what they are attending, and what they are interested_in:

class User
  include MongoMapper::Document

  key :name, :required => true

  many :events

  def likes
    Event.where(:like_ids => id).all
  end

  def attending
    Event.where(:attendees => id).all
  end

  def interested_in
    Event.where(:interested => id).all
  end

  def likes_event(event)
    event.likes << self
    event.save!
  end

To see how the different styles are used, you can check out the specs. For example:

Adding users who like an Event:

  it "should track interested" do
    expect {
      @event.interested_in(@jared)
      @event.interested_in(@sally)
    }.to change {@event.interested.size}.by(2)
  end

it "should allow 'likes'" do
    expect {
      @event_2.likes << @martha
    }.to change {@event_2.likes.size}.by(1)
  end

Or from the User perspective:

  it "should allow me to add an event I like" do
    @fred.likes_event(@event_2)
    @event_2.likes.size.should > 0
    @fred.likes.count.should > 0
  end

Or attendees, from the Event:

  it "should list the events I am attending" do
    [@fred, @harry].each {|u| @event.attending(u)}
    @fred.attending.count.should > 0
  end

MongoMapper vs MongoDB Cursor Stats

My findings showed that the cursor approach was ~1.8x faster.

There’s probably some underlying “but of course” comment waiting to come out of John Nunemaker (creator of the amazing MongoMapper) or Kyle Banker (MongoDB expert).

Like, “but of course letting the database server manage the work is always better than returning a big hunk of documents!”

The two flavors of iteration look basically like this:

cursor = coll.find({:doctor_num => /^staff_id_numberd{6}/})
error_accounts = Account.all(:doctor_num => /^staff_id_numberd{6}/)

The findings were based on “correcting” 5,929 of the 9,002 total accounts.

	Time	Memory
Cursor	166 sec	104K
MM Array	293 sec	175K

The scientist in me says do a test across a larger number of accounts: 90K, 900K 9M — and see what the trend looks like for the cursor — I would expect pretty flat. The pragmatist says I got more important work to do on our V2 of the production app <g>.

From this little bit of data (see the second figure), it seems that the cursor’s lead in the speed department diminished with increasing record counts. However, the memory consumption stays pretty flat for the cursor approach. I’m sure that the array approach will run out of memory at some point when you try and process a lot of records — never a good thing. (Maybe I should do some research on our message log — ~400k per month.)

Compare Processing Speed and Memory Usage for Cursor and Array Approach

The code for the Cursor way is shown here, with the lines of interest highlighted:

def self.fix_errors_cursor_style
  coll = MongoMapper.database['accounts']
  error_accounts = coll.find({:doctor_num => /^staff_id_numberd{6}/})
  error_accounts.each do |rec|
    new_doctor_num = rec["doctor_num"].match(/(d{6})/).to_s
    accounts = Account.find_dupes(new_doctor_num)
    if accounts.size > 1
      Account.merge_accounts new_doctor_num
    else
      coll.update({"_id" => rec["_id"]}, {"$set" => {"doctor_num" => new_doctor_num}})
    end
  end
end

The code for the MongoMapper way looked like this:

def self.fix_errors_array_style
  error_accounts = Account.all(:doctor_num => /^staff_id_numberd{6}/)
  error_accounts.each do |a|
    new_doctor_num = a.doctor_num.match(/(d{6})/).to_s
    accounts = Account.find_dupes(new_doctor_num)
    if accounts.size > 1
      Account.merge_accounts new_doctor_num
    else
      a.update_attributes( :doctor_num => new_doctor_num )
      result = a.save
      if a.errors
        a.errors.each_pair {|k,e| puts ">>> #{k}: #{e}"}
      end
    end
  end
end

In case this counts for completeness of information presented… The rough numbers of the collection look like this:

“count”=>9002,
“size”=>5829364,
“avgObjSize”=>647.56,
“storageSize”=>13880064,
“numExtents”=>5,
“nindexes”=>4,
“lastExtentSize”=>10420224,
“paddingFactor”=>1.01,
“flags”=>1,
“totalIndexSize”=>1679360,
“indexSizes”=>{“_id_”=>385024, “login_1″=>352256, “msid_1″=>352256, “doctor_num_1″=>589824}, “ok”=>1.0}

Data Modeler Meets Objects

On the Agile Modeling forum, there was a great post by a Data Modeler requesting help on how to interpret an Object Model in terms of Data Modeling concepts.

My reply:

Some DM people “get it” and others remain stymied by the OM world 😉

Biggest issue is that the relationships in OO can often be “backwards” to many in the DM camp. Conversely, in SQL, you can usually get at any data that you want, regardless of the “direction”… A Client can have Address(es). Or, you can look at an address and probably tell me which clients are at that address.

You should endeavor to think in Objects. And, object models are often in the eye of the beholder. The perspective for a given model needs to be understood. That is, what did the modeler intend to convey?

Cutting to the chase (you get what you pay for here):

Classes

Class == Table
(non-transient) Attribute == Column
Normally i come up with standards for column width for
various strings in the model

Names are varchar 125
URLs are varchar 416 (making this up )
Descriptions are Text
etc.

Associations

typically are whole-part or containment concepts. A Client——-0..* Address.
if you see a filled-in diamond, think “Cascade Delete”
other associations are simply relationships

Association Class

This is simply an unfinished portion of the model — or a short cut.
It means that there is more “information” to be stored than
simply the pure association
For example: Insurance Agent ——– Agency can have an
association class like “Registered” that shows the dates

when the agent was active (employed) at the Agency.
You have to eventually implement the Association class as a
class/table in between, and connect the classes with the

proper cardinality.

Generalization

do not bother doing LDM for this (kind of like unary operators should be avoided)
parent has some attributes, children probably add more
you have a choice of the following:

one superset table with parent and child columns. for
different child classes, you may have unfilled

columns. a porous table so to speak.
use joins of child tables to parent tables. suffer the
cost of a join
you can determine which is better based on projected
usage and run some plans both ways with loaded sample

data and representative queries and measure

performance to make up your mind which way is best for

your specific application

Implements

This is for an interface, which is purely behavior (methods)
You may be able to ignore completely
Or, if it is being used in an association (like a one class
holds a collection of things implementing the interface),

then you have to pay attention to it:

It can serve as a “reference” to many other classes
that implement that Interface
You may have to persist that association and figure
out how all the underlying implementing classes/tables

can be held in a collection (typically a join table

sort of a thing suffices)

Hope this helps!

Technical Debt

Jon Kern's ramblings on software development

Category Archives: Database

Mongo Remembers All Keys

Removing Keys

MongoDB Honey Badger

MongoMapper Query Overview

Use the Console

Association Extension

Named Scope

Factory Girl and MongoMapper

Factory Construction

Refactor Original Setup

Refactor Cucumber

Subtle Details

More Complicated Example

Multiple Many-to-Many Associations in MongoMapper

MongoMapper vs MongoDB Cursor Stats

My findings showed that the cursor approach was ~1.8x faster.

Data Modeler Meets Objects