Rationale behind not having support for foreign keys with multiple columns

Hi,

I’d like to know the reasoning behind the Rails way of not having support for multi-column foreign keys.

Let’s take a sample schema with the following entities: Apartment Building Tenant

These are the relationships: Apartment may have one or more buildings. Building belongs to an Apartment. Tenant belongs to an Apartment. Building will have one or more tenants. A tenant can live in one or more buildings.

The above relationships can be mapped into tables as follows: Building will have a foreign key for apartments. A tenant will have a foreign key for apartments. There will be a many-to-many between tenant and buildings.

Now, the issue here is: the building_id and tenant_id in the building_tenants table does not guarantee that those belong to the same apartment.

This consistency can be guaranteed by having an apartment_id column in the building_tenants table and then having foreign keys for (building_id, apartment_id) → buildings and (tenant_id, apartment_id) → tenants. However, rails does not support this by default. I just want to know what is the Rails rationale behind this. Should be enforce this constraint only on the application code itself and not on the database? Or, is there any other way we can solve this problem?

First, yes, you can have multi-column keys. Rails doesn’t recommend it, but you can and they do work. You’re on your own in SQL-land to make them, there’s no simple migration method to make them.

But second, you may be looking at this the wrong way around. Rails encourages you to think of your data as an expression of objects in the classical sense. If you get those objects and their relationships modeled correctly, then the database will take care of you.

From your question, I’m guessing you’re using the old-school “has and belongs to many” relationships, with the building_tenants table standing in for the join between buildings and tenants. Modern Rails use would have you model that relationship with the has_many_through macro instead, which offers you more opportunities to enrich that relationship, rather than the “dumb join” of HABTM.

Consider this set of relationships instead:

Apartment
belongs_to :building 
belongs_to :tenant

Building
has_many :apartments
has_many :tenants, through: :apartments 

Tenant
has_many :apartments #possible, I suppose, it's their money
has_many :buildings, through: :apartments

So in this relationship, Apartment has a building_id and a tenant_id, and you can happily make those true database foreign keys. They don’t need to be a compound foreign key, because either side of the three-way relationship can enforce that relationship to the appropriate side of the join.

Any attempt to destroy any side of this would cause the FK lock to come into play. To ensure that you could destroy the relationships intentionally, you would add in (it’s required, actually) the dependent: :destroy notation to the has_many side of these relationships. That would ensure that child records are properly destroyed before their parent is destroyed.

Walter

2 Likes

Thanks for answering. Yes, I think you are right with this one.

Although, I think, I did not choose a correct sample schema to explain my problem. Let’s take the schema which you have designed with the apartments being the table through with building and tenants have a many-to-many relationship.

Now, let’s add one more entity, say maintenance_staff (for property maintenance). A maintenance employee can work on multiple buildings and at the same time, a single building can have multiple maintenance employees associated with it. Also, a maintenance employee has a many to many relationship with an apartment through maintenance_apartments table.

maintenance_staff_buildings
 maintenance_staff_id
 building_id
 <attributes>

maintenance_apartments
 maintenance_staff_building_id (FK with maintenance_staff_buildings since we only want folks who are associated with the building in which this apartment exists ) 
 apartment_id
 <attributes>

Now, here, since we can infer the building_id from maintenance_staff_building_id as well as apartment_id, data is not consistent on the db level. This is where I think if we split the apartment_id into building_id, tenant_id and also maintenance_staff_building_id can be replaced with maintenance_staff_id, we can have data consistency. However, here is where we would need multi-column foreign keys. Is there any way I can avoid multi-column foreign keys and still enforce data consistency? What is the Rails recommended way in this case?

Consider adding an Assignment object between MaintenancePerson and Building (or Apartment, if you use multiple maintenance people per building). That way you isolate that relationship in its own object.

Walter

Can you elaborate please?

What will be the foreign keys in that assignment object? Also, how will that object fit into the current schema?

The maintenance_apartments relationship itself is that “Assignment” object, right?

The assignment would belong to the maintenance person and the apartment. The apartment could either have one or many maintenance people through that assignment, and the maintenance person would have many apartments through many assignments. I’m on my phone, so I can’t use the web interface to type this out in code, but hopefully you can translate from this prose to Rails association macros.

The critical thing I’m trying to help you learn here is to stop thinking in terms of how you might create a join table in a database, and start thinking in terms of named objects that model relationships. This makes it trivial to add useful data to the relationship, like maybe what kind of maintenance that person is doing (plumbing, electrical, painting) if you have multiple people assigned to one apartment.

If you wanted to keep this “maintenance_apartments” table, you could always add an id, time stamps, and any other metadata columns to it, and use that as the basis for your Assignment object by using the self.table_name method to change the defaults, but that’s just going to confuse people later. Best to follow the conventions whenever you can.

Walter

Okay. Interesting.

Let me try to think of it that way.

This is the schema I can think of (Correct me, if I am wrong):

apartments
  building_id
  tenant_id

maintenance_person_building_assignments
  maintenance_person_id
  building_id

maintenance_person_apartment_assignments
 maintenance_person_id
 apartment_id

Now, the thing here is: the maintenance_persons are assigned to one or more buildings as well as one or more apartments. So, for eg. Maintenance Person A, B are assigned to Building B1 and there is an apartment in that building called as say AP1. Now, only Maintenance Person A or B must be allowed to work on apartment AP1. This is not enforced by this schema.

How do I change the schema so that this is enforced?

You’ve gone one step too far. You don’t need to have two different tables to manage the two kinds of relationships. One of Rails’ primary objectives is to satisfy the Single Responsibility Principle. The Apartment already knows what Building it is in, and that fact is not likely to change. So instead of having a separate object to manage the relationship between the building and the maintenance person, simply ask the apartment which building it is in, if you need to know that.

Apartment
  belongs_to :building 
  has_many :leases
  has_many :tenants, through: :leases
  has_many :assignments
  has_many :maintenance_people, through: :assignments

Assignment
  belongs_to :apartment 
  belongs_to :maintenance_person

Building
  has_many :apartments
  has_many :maintenance_people, through: :apartments
  has_many :leases, through: :apartments
  has_many :tenants, through: :leases

Lease
  belongs_to :tenant
  belongs_to :apartment

MaintenancePerson
  has_many :assignments
  has_many :apartments, through: :assignments
  has_many :buildings, through: :apartments

Tenant
  has_many :leases
  has_many :apartments, through: :leases

So now you can get the building from the apartment with @apartment.building or the assignment with @assignment.apartment.building. No need for a different table to manage duplicate data. You’ve stored the data in the most specific manner possible, and you don’t have to update two tables if you want to change a maintenance person’s assignment to a different apartment.

Walter

In case you’re looking at this in mail, I’ve updated the Web version here with a little more detail about tenants: Rationale behind not having support for foreign keys with multiple columns - #8 by walterdavis

Walter

Thanks for the detailed answer.

The reason I had different relationships for maintenance person with apartments and buildings is because the set of attributes is different in both.

i.e. there are different attributes for the assignment between apartment and maintenance_person, for eg. <notifications_enabled, chats_enabled> and there are different attributes for the assignment between building and maintenance_person, for eg. <contract_duration, contract_date>

Note that: above is just an example of the attributes, there are several such attributes at both the levels.

To, sum it up,

Building has many tenants via apartments. Tenant has many buildings via apartments. Maintenance person has many assignments with buildings. Maintenance person has many assignments with Buildings. Maintenance Person has many assignments with Apartments.

I guess the question I have would be is what direction does this graph flow? Is the maintenance person more likely to be associated with an entire building, or with specific apartments in that building? If the former, then you could put all of the relationship on the building, and list the apartments within it as “belonging” to that maintenance person. What I’m trying to help you avoid is the madness that is double-entry adjustments. You really don’t want to go through all the apartments in a given building and assign the maintenance person to them after first assigning them to the building, right?

If these buildings aren’t very large, then I could see you using a nested form on the building to manage any exceptions to the rule that all apartments in a maintenance person’s assigned building “belong” to them, say by tracking an array of apartment_ids in the association record. By default it would equal @building.apartment_ids but if you un-checked some, you might end up with some subset of that array.

Anything like that gets you pretty far into the weeds of managing a relationship’s members, and Rails already wants to do that for you. I have seen a lot of well-intended code that tries to re-write ActiveRecord at some level or another, and it rarely ends well.

Another approach you might look at is a Polymorphic join. That allows you to assign to either a building or an apartment with one intermediate class. You put the superset of attributes on that join object, and only fill in the ones that apply to the kind of object you are linking.

Please explain to me why you want to have two separate relationships between three objects, when there is a clear ancestry relationship between two of them.

Walter

I actually want to do that. When a maintenance_person gets created, they first reside in the “global” maintenance_persons table. Then, they can be assigned to one or more buildings. After that, they can be assigned to one or more apartments. Only those maintenance person in that building can be assigned to any apartment which is in that building.

Yes, there is a clear ancestry between building and apartments. Building has one or many apartments inside it. The reason for having these relationships is because there are several attributes which needs to be captured at every level. When the maintenance person gets assigned to a building, there can be certain “building” level attributes which need to be captured for that maintenance person. Similarly, for apartment. And already, there is a relationship between apartment and building which also needs to capture several attributes.

This is the reason we have: maintenance_person_building_assignments, maintenance_person_apartment_assignments, apartments, and buildings

Building             ->                  MaintenancePersonBuildingAssignment

    ^                                                           ^
    |                                                           |

Apartment            ->                MaintenancePersonApartmentAssignment

So, the issue is: when I am creating this MaintenancePersonApartmentAssignment, it has:

      maintenance_person_building_assignment_id
      apartment_id
      <several attributes>

Here, the building can be inferred from both apartment_id, maintenance_person_building_assignment_id. Of course, this can be handled in the application code by making sure that it is the same building when I create a row in this table. However, this data would not be consistent on the database level. We would have many records in this table and data integrity would be very important.

Hence, I wanted to know the “Rails way”, wherein I can capture these attributes, have relationships and also maintain referential integrity.

Okay, the penny drops for me. You want to “layer” the attributes on one another, from the most general to the most specific. I would do this by having the building assignment be required, but the apartment assignment optional. In code, you could layer those attributes together. In the form where you assign a person to a specific apartment, you would build the list of apartments by getting only those in that person’s buildings.

Apartment
  belongs_to :building
  has_many :apartment_assignments
  has_many :maintenance_people, through: :apartment_assignments

Building
  has_many :apartments
  has_many :building_assignments
  has_many :maintenance_people, through :building_assignments
  
MaintenancePerson
  has_many :building_assignments
  has_many :buildings, through: :building_assignments
  has_many :apartments, through: :buildings
  has_many :apartment_assignments

BuildingAssignment
  belongs_to :maintenance_person
  belongs_to :building

ApartmentAssignment
  belongs_to :maintenance_person, optional: true
  belongs_to :apartment, optional: true
  delegate_missing_to :building_assignment

  def building_assignment
    apartment.building.building_assignments.find_by(maintenance_person: maintenance_person)
  end

That should get you pretty far along toward what you’ve described here. The trick is to start by trying to look up the assignment from the apartment, and then fall back to the building. Something like this, maybe:

@assignment = @apartment.apartment_assignments.find_by(maintenance_person: @maintenance_person) || @apartment.building.building_assignments.find_by(maintenance_person: @maintenance_person)

Now you have an @assignment which might be an ApartmentAssignment or a BuildingAssignment, but you can call all the building-specific methods on it either way. The apartment-specific methods will either be there or they won’t, so you’ll want to guard them with try or similar.

Walter

2 Likes

Wow, nice!

That’s a really neat way to do it in ActiveRecord.

Just one doubt: It still doesn’t guarantee referential integrity just as far as the database is concerned, right?
As far as the database is concerned, I could insert a row into apartment_assignment for a maintenance_person who still does not have any row in the building_assignments table, right?

I mean, if we were to bypass ActiveRecord, I could still insert invalid data in the apartment_assignments table, right?

Nothing will protect you from someone leaning into your database and spilling their Coke on it, either. In Rails, the database is just a part (a very dumb part) of the whole application. It’s a “box of bits” that your application logic can use to store and retrieve data. The real brains of the operation are in your Ruby files.

ApartmentAssignment
  validates :maintenance_person_id, inclusion: { in: building.building_assignments.maintenance_person_ids }

Or, you could lazy-create the building assignment in the action of adding an apartment assignment (if that’s the way you want to go). Recall also, I recommended that you build the picker of available “assignable” apartments for a given maintenance person by gathering up all of their buildings’ apartments into a list. That’s a user-interface nicety that would make it difficult to screw up the data. The validation would be the cherry on top that would keep out illegal values.

Walter

Ah! Okay!

So, that’s the recommended approach to follow in Rails. In this case, I’ll have to compromise on the data-integrity as far as just the database is concerned. And, it’s recommended for me to handle everything in the application code and just hope no one spills their coke :slight_smile:

Instead of both these approaches, wouldn’t it be nicer to just have the building_id in the apartment_assignments table as well? For example, In the apartment_assignments table, I could have:

apartment_assignments
  id
  building_id
  apartment_id
  maintenance_person_id
  <other attributes>

This will help me in having referential integrity across both my tables. i.e. FK(apartment_id, building_id) REFERENCES apartments(id, building_id) and FK(building_id, maintenance_person_id) REFERENCES building_assignments(building_id, maintenance_person_id).

Also, many of my Select queries are going to be based on the “building_id” in the apartment_assignments table. Hence, it will also avoid a JOIN operation for me.

Note that: I am not suggesting that we completely forego surrogate primary keys. Surrogate primary keys would be still used in all the other things like JOIN etc. and all the other places where the Rails magic applies. However, just for referential integrity, we could accept an array of “foreign_keys” which will help us also keep our data sane.

There’s a strong tradition in Rails to apply these sorts of optimizations in a second pass, after you’ve identified it as a pain point. Premature Optimization and all that…

You could cache the building_id in the apartment_assignments table if you wanted to, and I agree it could mean a shorter query that might be faster to execute, but the trade-off, as in all such optimizations, is additional maintenance effort. Instead of just using the framework to manage the relationships, you’ll have to engineer some sort of sidecar that keeps the building_id updated when the apartment_ids are changed.

I’d encourage you to try it the Rails way first, and then use New Relic or Skylight to probe the running application in production mode, and see where your actual slow-points are. In my many, many years of doing this, I can agree with the experts who say “it’s rarely where you expect it to be”.

Walter

Got it.

I’ll go with the Rails way first.

Thanks for your explanation and your time.

2 Likes