Forceps: import models from remote databases
I recently released a gem called forceps. It lets you copy data from remote databases using Active Record. It addresses a problem I have found many times: importing data selectively from production databases into your local database to play with it safely. In this post, I would like to describe how the library works internally. You can check its usage on the README.
The idea
Active Record lets you change the database connection on a per model basis using the method .establish_connection
. Forceps takes each child of ActiveRecord::Base
and generates a child class with the same name in the namespace Forceps::Remote
. These remote classes also include a method #copy_to_local
that copy the record and all the associated models automatically.
The main reason for managing remote Active Record classes is that I wanted to use its reflection and querying support for discovering associations and attributes. A nice side effect is that the library lets you explore remote databases in your local scripts with ease.
Defining remote classes and remote associations
The definition of the child model classes with the remote connection is shown below:
def declare_remote_model_class(klass)
class_name = remote_class_name_for(klass.name)
new_class = build_new_remote_class(klass, class_name)
Forceps::Remote.const_set(class_name, new_class)
remote_class_for(class_name).establish_connection 'remote'
end
def build_new_remote_class(local_class, class_name)
Class.new(local_class) do
...
include Forceps::ActsAsCopyableModel
...
end
end
end
With this definition, remote classes let you manipulate isolated remote objects. But the inherited associations are still pointing to their local counterparts. I solved this problem by cloning the association and changing its internal class attribute to make it point to the proper remote class.
def reference_remote_class_in_normal_association(association, remote_model_class)
related_remote_class = remote_class_for(association.klass.name)
cloned_association = association.dup
cloned_association.instance_variable_set("@klass", related_remote_class)
cloned_reflections = remote_model_class.reflections.dup
cloned_reflections[cloned_association.name.to_sym] = cloned_association
remote_model_class.reflections = cloned_reflections
end
Cloning trees of active record models
For copying simple attributes, I ended up invoking each setter directly. I intended to do it with mass assignment but disabling its protection in Rails 3 is pretty tricky, as it can be enabled in multiple ways. Rails 4 moved mass-assignment protection to the controllers, but I wanted forceps to support both versions.
def copy_attributes(target_object, attributes_map)
attributes_map.each do |attribute_name, attribute_value|
target_object.send("#{attribute_name}=", attribute_value)
end
end
Cloning associations is done by fetching all the possible associations of each model class with .reflect_on_all_associations
, and
just copying the associated objects depending on its cardinality. For example: this method copies a has_many
association:
def copy_associated_objects_in_has_many(local_object, remote_object, association_name)
remote_object.send(association_name).find_each do |remote_associated_object|
local_object.send(association_name) << copy(remote_associated_object)
end
end
It uses a cache internally to avoid copying objects more than once.
Handling STI and polymorphic associations
Supporting Single Table Inheritance and polymorphic associations turned out to be one of the most challenging parts. Both features rely on a type column containing the model class to instantiate. This column is referenced in multiples places in the Rails codebase, such as in join queries or when instantiating records.
For example, when instantiating objects from queries Rails uses the hash of attributes obtained from the database. In order to change the type column that method is overridden in remote classes:
Class.new(local_class) do
...
if Rails::VERSION::MAJOR >= 4
def self.instantiate(record, column_types = {})
__make_sti_column_point_to_forceps_remote_class(record)
super
end
else
def self.instantiate(record)
__make_sti_column_point_to_forceps_remote_class(record)
super
end
end
def self.__make_sti_column_point_to_forceps_remote_class(record)
if record[inheritance_column].present?
record[inheritance_column] = "Forceps::Remote::#{record[inheritance_column]}"
end
end
...
end
Testing against multiple Rails versions
Testing against multiple Rails versions was far easier than I expected. I used this approach by Richard Schneeman: using an environment variable to configure the Rails version at the .gemspec
file:
if ENV['RAILS_VERSION']
s.add_dependency "rails", "~> #{ENV['RAILS_VERSION']}"
else
s.add_dependency "rails", "> 3.2.0"
end
And
set the target versions in travis.yml
:
env:
- "RAILS_VERSION=3.2.16"
- "RAILS_VERSION=4.0.2"
The awesomeness of travis will do the rest.
Conclusions
A thing I loved about this project is that I started with a very simple idea without knowing if it was going to work with real-life complex models. I just wrote a very simple test and handled more and more cases incrementally. It ended up being more complex than I expected but it is still a pretty compact library thanks to the wonders of Ruby, metaprogramming and Active Record.
The code for Forceps is available at Github. Pull requests are welcomed.