I’ll come back and describe this in more detail. I just wanted to toss it out here while I have it on hand…
I couldn’t find a tutorial that I grokked clearly enough, so I will make my own.
This is just the first baby step… eventually it will grow to multiple params and date ranges… (gulp).
Domain: I have a list of books/articles by author.
Goal: I want to have a count of books/articles by author.
Code here.
require File.dirname(__FILE__) + '/../spec_helper'
describe "MapReduce" do
before :all do
Article.delete_all
Article.create(:author => "John Knowles", :title => "A Separate Peace")
Article.create(:author => "Ayn Rand", :title => "Fountainhead")
Article.create(:author => "John Knowles", :title => "Peace Breaks Out")
Article.create(:author => "Ayn Rand", :title => "Atlas Shrugged")
Article.create(:author => "Pearl S. Buck", :title => "East Wind:West Wind (1930)")
Article.create(:author => "Pearl S. Buck", :title => "The House of Earth (1935)")
Article.create(:author => "Pearl S. Buck", :title => "The Good Earth (1931)")
Article.create(:author => "Pearl S. Buck", :title => "Sons (1933)")
Article.create(:author => "Pearl S. Buck", :title => "A House Divided (1935)")
Article.create(:author => "Pearl S. Buck", :title => "The Mother (1933)")
Article.create(:author => "Pearl S. Buck", :title => "This Proud Heart (1938)")
Article.create(:author => "Pearl S. Buck", :title => "The Patriot (1939)")
Article.create(:author => "Pearl S. Buck", :title => "Other Gods (1940)")
end
describe "setup" do
it "should have data" do
Article.count.should > 0
end
end
describe "Simple Stats" do
it "should compute article counts per author" do
results = ArticleStats.book_counts_by_author
results.should include({"_id"=>"Ayn Rand", "value"=>2.0})
results.should include({"_id"=>"John Knowles", "value"=>2.0})
results.should include({"_id"=>"Pearl S. Buck", "value"=>9.0})
end
end
end
class Article
include MongoMapper::Document
key :title, String
key :author, String
timestamps!
end
class ArticleStats
def self.book_counts_by_author
results = []
counts_cursor = ArticleStats.build.find()
# map_hash is an OrderedHash that looks like
# {"_id"=>"Ayn Rand", "value"=>2.0}
counts_cursor.each_with_index do |map_hash,i|
results << map_hash
puts "#{i+1}: #{map_hash["_id"]}: #{map_hash["value"]}"
end
# An array of map_hash results for each unique key
results
end
# Create a "record" for each document that has the author and a count of 1
def self.map
<<-MAP
function() {
emit(this.author, 1);
}
MAP
end
# When the map part is run, it will have bundled the unique authors up as keys
# and provided the value(s) that match each key. In this case, we are planning to
# run the map over each Article instance, essentially sorting by author, and then
# the values will reflect each title for that author.
def self.reduce
<<-REDUCE
function(key, values) {
var article_count = 0;
for (var i in values) {
article_count += 1;
}
return article_count;
}
REDUCE
end
def self.build
Article.collection.map_reduce(map, reduce, :out => "mr_results")
end
end
And the results:
1: Ayn Rand: 2.0 2: John Knowles: 2.0 3: Pearl S. Buck: 9.0

Pingback: Baby Steps with Map-Reduce » Technical Debt