# An ActiveRecord conundrum I'm going to describe a weird problem I faced when using ActiveRecord today. To protect the innocent, I'm not going to talk about the app I'm actually working on but will instead discuss a hypothetical but isomorphic database design for a clone of the popular blogging platform Tumblr. Tumblr lets you publish various different sorts of content. We might be tempted to shove all these types in a big STI table, but the types are all quite different from one another and so we give them their own tables. ```rb class CreateContentTables < ActiveRecord::Migration def change create_table :texts do |t| t.string :title t.text :body end create_table :quotes do |t| t.text :body t.string :source t.string :url end create_table :links do |t| t.string :title t.string :url end end end ``` These pieces of content each belong to a blog, which belongs to a user. However you'll notice the above tables have no foreign key for the blog. Instead, we'll use a join table called `posts` to link content items to blogs. There are various reasons for this in my actual app; in Tumblr it gives us cheap reblogging since we don't need to copy a content record, just create a new `posts` record. `posts` has a compound foreign key referring to content items. ```rb class CreateBlogsAndUsers < ActiveRecord::Migration def change create_table :users do |t| t.string :username end create_table :blogs do |t| t.belongs_to :user t.string :title end create_table :posts do |t| t.timestamps t.belongs_to :blog t.belongs_to :content, polymorphic: true end end end ``` Finally, each content item can have comments, which belong to a user and just like `posts` they have a compound foreign key for the content they relate to. ```rb class CreateComments < ActiveRecord::Migration def change create_table :comments do |t| t.timestamps t.belongs_to :user t.belongs_to :content, polymorphic: true t.text :body t.boolean :accepted end end end ``` We migrate the database, and Rails spits out the following SQL schema: ``` mysql> describe users; +----------+--------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +----------+--------------+------+-----+---------+----------------+ | id | int(11) | NO | PRI | NULL | auto_increment | | username | varchar(255) | YES | | NULL | | +----------+--------------+------+-----+---------+----------------+ mysql> describe blogs; +---------+--------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +---------+--------------+------+-----+---------+----------------+ | id | int(11) | NO | PRI | NULL | auto_increment | | user_id | int(11) | YES | | NULL | | | title | varchar(255) | YES | | NULL | | +---------+--------------+------+-----+---------+----------------+ mysql> describe posts; +--------------+--------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +--------------+--------------+------+-----+---------+----------------+ | id | int(11) | NO | PRI | NULL | auto_increment | | created_at | datetime | YES | | NULL | | | updated_at | datetime | YES | | NULL | | | blog_id | int(11) | YES | | NULL | | | content_id | int(11) | YES | | NULL | | | content_type | varchar(255) | YES | | NULL | | +--------------+--------------+------+-----+---------+----------------+ mysql> describe texts; +-------+--------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +-------+--------------+------+-----+---------+----------------+ | id | int(11) | NO | PRI | NULL | auto_increment | | title | varchar(255) | YES | | NULL | | | body | text | YES | | NULL | | +-------+--------------+------+-----+---------+----------------+ mysql> describe quotes; +--------+--------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +--------+--------------+------+-----+---------+----------------+ | id | int(11) | NO | PRI | NULL | auto_increment | | body | text | YES | | NULL | | | source | varchar(255) | YES | | NULL | | | url | varchar(255) | YES | | NULL | | +--------+--------------+------+-----+---------+----------------+ mysql> describe links; +-------+--------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +-------+--------------+------+-----+---------+----------------+ | id | int(11) | NO | PRI | NULL | auto_increment | | title | varchar(255) | YES | | NULL | | | url | varchar(255) | YES | | NULL | | +-------+--------------+------+-----+---------+----------------+ mysql> describe comments; +--------------+--------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +--------------+--------------+------+-----+---------+----------------+ | id | int(11) | NO | PRI | NULL | auto_increment | | created_at | datetime | YES | | NULL | | | updated_at | datetime | YES | | NULL | | | user_id | int(11) | YES | | NULL | | | content_id | int(11) | YES | | NULL | | | content_type | varchar(255) | YES | | NULL | | | body | text | YES | | NULL | | | accepted | tinyint(1) | YES | | NULL | | +--------------+--------------+------+-----+---------+----------------+ ``` Like a good little Rails developer we go and create model classes for all these tables: ```rb class User < ActiveRecord::Base has_many :blogs has_many :comments end class Blog < ActiveRecord::Base belongs_to :user has_many :posts end class Post < ActiveRecord::Base belongs_to :blog belongs_to :content, polymorphic: true end class Text < ActiveRecord::Base has_many :posts, as: :content has_many :comments, as: :content end class Quote < ActiveRecord::Base has_many :posts, as: :content has_many :comments, as: :content end class Link < ActiveRecord::Base has_many :posts, as: :content has_many :comments, as: :content end class Comment < ActiveRecord::Base belongs_to :user belongs_to :content, polymorphic: true end ``` Now, let's create a couple of users: ```rb alice = User.create(username: 'alice') bob = User.create(username: 'bob') ``` Alice has a blog where she writes about MySQL optimisation. Here's a few posts she's added recently: ```rb alice = User.where(username: 'alice').first blog = alice.blogs.create(title: 'Not even once') text_entry = Text.create( title: 'How to bulk update a MySQL table', body: 'First, delete all your indexes.' ) quote_entry = Quote.create( body: 'There are two hard things in computer science: MySQL consistency.', source: 'CSS Perverts', url: 'https://medium.com/could-you-not' ) link_entry = Link.create( title: 'Web development that does not always hurt', url: 'http://rubyonrails.org/' ) [text_entry, quote_entry, link_entry].each do |entry| Post.create(blog: blog, content: entry) end ``` One day, along comes Bob and leaves some helpful comments on Alice's blog: ```rb bob = User.where(username: 'bob').first text_entry = Text.first quote_entry = Quote.first Comment.create( user: bob, content: text_entry, body: 'This sounds like a lot of work.' ) Comment.create( user: bob, content: text_entry, body: '4 hours later and I am still at my computer. Help!' ) Comment.create( user: bob, content: quote_entry, body: 'Never were truer words spoken. Such thought leadership.' ) ``` So, here's the problem. When Alice logs into Tumblr, she wants to see how many new comments there are, because she needs to manually moderate and accept each one. So, we want the count of all comments that relate to content that's posted on one of Alice's blogs. We begin by adding a scope to `Comment` so we can count unmoderated comments: ```rb class Comment < ActiveRecord::Base belongs_to :user belongs_to :content, polymorphic: true scope :for_moderation, -> { where(accepted: nil) } end ``` A naive way to count unmoderated comments would be to iterate over Alice's blogs, and for each post in each blog, count the number of unmoderated comments each post's content has. This loads a lot of unnecessary objects and runs a lot of queries. ``` >> alice = User.where(username: 'alice').first => # >> alice.blogs.inject(0) { |s,b| s + b.posts.inject(0) { |s,p| s + p.content.comments.for_moderation.count } } Blog Load (0.4ms) SELECT `blogs`.* FROM `blogs` WHERE `blogs`.`user_id` = 1 Post Load (0.3ms) SELECT `posts`.* FROM `posts` WHERE `posts`.`blog_id` = 1 Text Load (0.4ms) SELECT `texts`.* FROM `texts` WHERE `texts`.`id` = 1 ORDER BY `texts`.`id` ASC LIMIT 1 (0.3ms) SELECT COUNT(*) FROM `comments` WHERE `comments`.`content_id` = 1 AND `comments`.`content_type` = 'Text' AND `comments`.`accepted` IS NULL Quote Load (0.1ms) SELECT `quotes`.* FROM `quotes` WHERE `quotes`.`id` = 1 ORDER BY `quotes`.`id` ASC LIMIT 1 (0.6ms) SELECT COUNT(*) FROM `comments` WHERE `comments`.`content_id` = 1 AND `comments`.`content_type` = 'Quote' AND `comments`.`accepted` IS NULL Link Load (0.2ms) SELECT `links`.* FROM `links` WHERE `links`.`id` = 1 ORDER BY `links`.`id` ASC LIMIT 1 (0.6ms) SELECT COUNT(*) FROM `comments` WHERE `comments`.`content_id` = 1 AND `comments`.`content_type` = 'Link' AND `comments`.`accepted` IS NULL => 3 ``` We'd like to do this in one query. Initially this seems hard because we can't join across the polymorphic set of content tables to walk from posts to comments. But, we don't actually need to do that. We can find all the comments for a `Post` by finding the comments with the same `content_{id,type}` as the `Post`. In SQL, we can write: ``` mysql> SELECT COUNT(*) -> FROM comments AS c -> INNER JOIN posts AS p ON p.content_id = c.content_id AND p.content_type = c.content_type -> INNER JOIN blogs AS b on b.id = p.blog_id -> WHERE b.user_id = 1 AND -> c.accepted IS NULL; +----------+ | COUNT(*) | +----------+ | 3 | +----------+ ``` Same answer, one query, and we didn't touch the content tables at all. So, if we can tell Rails how to relate comments directly to posts, then we can use relations and scopes to efficiently compose the above query. (We will actually end up with a slightly different, but equivalent query.) We ultimately want to be able to chain some scopes on `Comment` and count the result: ```rb Comment.for_blog_author(alice).for_moderation.count ``` Let's start by creating the `for_blog_author` scope. We ultimately want to join to the `blogs` table and filter by `blogs.author_id`. In Rails we can express that like this: ```rb class Comment < ActiveRecord::Base belongs_to :user belongs_to :content, polymorphic: true scope :for_moderation, -> { where(accepted: nil) } scope :for_blog_author, -> (author) { joins(:blog).where(blogs: {user_id: author.id}) } end ``` This new scope demands that comments have an associated `blog`. We know logically that a `Comment` belongs to a `Post`, which in turn belongs to a `Blog`. We can use `has_one :blog, through: :post` for the latter, but the former is more complicated. Recall the structure of the `comments` and `posts` tables: ``` mysql> describe posts; +--------------+--------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +--------------+--------------+------+-----+---------+----------------+ | id | int(11) | NO | PRI | NULL | auto_increment | | created_at | datetime | YES | | NULL | | | updated_at | datetime | YES | | NULL | | | blog_id | int(11) | YES | | NULL | | | content_id | int(11) | YES | | NULL | | | content_type | varchar(255) | YES | | NULL | | +--------------+--------------+------+-----+---------+----------------+ mysql> describe comments; +--------------+--------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +--------------+--------------+------+-----+---------+----------------+ | id | int(11) | NO | PRI | NULL | auto_increment | | created_at | datetime | YES | | NULL | | | updated_at | datetime | YES | | NULL | | | user_id | int(11) | YES | | NULL | | | content_id | int(11) | YES | | NULL | | | content_type | varchar(255) | YES | | NULL | | | body | text | YES | | NULL | | | accepted | tinyint(1) | YES | | NULL | | +--------------+--------------+------+-----+---------+----------------+ ``` So rather than a condition like `posts.id = comments.post_id` to link posts and comments, we have `posts.content_id = comments.content_id AND posts.content_type = comments.content_type`. My first (wrong) guess at how to express that relation looked like this: ```rb class Comment < ActiveRecord::Base belongs_to :user belongs_to :content, polymorphic: true belongs_to :post, -> (comment) { where(content_id: comment.content_id, content_type: comment.content_type) } has_one :blog, through: :post scope :for_moderation, -> { where(accepted: nil) } scope :for_blog_author, -> (author) { joins(:blog).where(blogs: {user_id: author.id}) } end ``` This implementation leaves Rails still looking at `posts.id`, since we've not told it what the 'primary key' for this relation is. ``` >> comment = Comment.first Comment Load (0.4ms) SELECT `comments`.* FROM `comments` ORDER BY `comments`.`id` ASC LIMIT 1 => # >> comment.blog Blog Load (0.4ms) SELECT `blogs`.* FROM `blogs` INNER JOIN `posts` ON `blogs`.`id` = `posts`.`blog_id` WHERE `posts`.`id` = NULL AND `posts`.`content_id` = 1 AND `posts`.`content_type` = 'Text' ORDER BY `blogs`.`id` ASC LIMIT 1 => nil ``` We need to make Rails think the 'primary key' to use on the `posts` table is `content_id` and `content_type`. Note: it's not the primary key of the `posts` table _in general_, it's only the 'primary key' in the terminology Rails has to express this particular join operation. This implementation gets us the query we want: ```rb class Comment < ActiveRecord::Base belongs_to :user belongs_to :content, polymorphic: true belongs_to :post, -> (comment) { where(content_type: comment.content_type) }, foreign_key: :content_id, primary_key: :content_id has_one :blog, through: :post scope :for_moderation, -> { where(accepted: nil) } scope :for_blog_author, -> (author) { joins(:blog).where(blogs: {user_id: author.id}) } end ``` Here it is in action: ``` >> comment = Comment.first Comment Load (0.4ms) SELECT `comments`.* FROM `comments` ORDER BY `comments`.`id` ASC LIMIT 1 => # >> comment.blog Blog Load (0.9ms) SELECT `blogs`.* FROM `blogs` INNER JOIN `posts` ON `blogs`.`id` = `posts`.`blog_id` WHERE `posts`.`content_id` = 1 AND `posts`.`content_type` = 'Text' ORDER BY `blogs`.`id` ASC LIMIT 1 => # ``` However, this snippet still doesn't work: ``` >> Comment.for_blog_author(alice).for_moderation.count app/models/comment.rb:5:in `block in ': undefined method `content_type' for # (NoMethodError) ``` This is because, when you access the `post` association within scope/join logic, the argument to the `belongs_to :post` lambda is no longer a `Comment`. To make the lambda work in both cases, I did this: ```rb class Comment < ActiveRecord::Base belongs_to :user belongs_to :content, polymorphic: true belongs_to :post, -> (object) { if object.is_a? Comment where(content_type: object.content_type) else where('posts.content_type = comments.content_type') end }, foreign_key: :content_id, primary_key: :content_id has_one :blog, through: :post scope :for_moderation, -> { where(accepted: nil) } scope :for_blog_author, -> (author) { joins(:blog).where(blogs: {user_id: author.id}) } end ``` This ends up running the query I want to count the unmoderated comments on Alice's blogs: ``` >> Comment.for_blog_author(alice).for_moderation.count (0.2ms) SELECT COUNT(*) FROM `comments` INNER JOIN `posts` ON `posts`.`content_id` = `comments`.`content_id` AND (posts.content_type = comments.content_type) INNER JOIN `blogs` ON `blogs`.`id` = `posts`.`blog_id` WHERE `blogs`.`user_id` = 1 AND `comments`.`accepted` IS NULL => 3 ``` Now, although it gets the right answer, and does so efficiently, that `belongs_to` code is incredibly ugly. Surely there is a better way to express this?