RFC: Finally support composite primary keys?

The composite_primary_keys gem has been around since 2006 and has done a pretty good job keeping up with Rails versions.

(aside: There are plenty of old threads on the same topic, but I didn’t see anything recently, nor any clear stand that Rails core is taking on the matter)

In latest Rails, the surface area to adjust seems pretty small.

Sometimes, there are good reasons to use composite primary keys. For example

I have a Job which belongs to a Project and to a Customer.
There’s a join-table entity called ProjectCustomer. It has an auto-increment primary key and a unicity constraint on project_id and customer_id. To make it possible to define a non-wonky and eager-loadable Job has_one :project_customer, foreign_key: [:project_id, :customer_id], primary_key: [:project_id, :customer_id] all I need to do is install the gem. I don’t even need to define ProjectCustomer self.primary_keys = :project_id, :customer_id.

This a very reasonable and non-fancy use-case, I think, without which, I’d need do something like has_one :project_customer, through: :project, ->(record) { where(customer: record.customer) } which is not great.

related code is something like a few small areas in

Is it time? Practically speaking, there are already maintainers of the feature…

3 Likes

Let me start by saying, I’m not against bringing this into Rails by any means. I’ve worked on a legacy Rails app that was built around composite primary keys and the gem really helped me.

I would just point out that it’s the use of composite primary keys in general present performance and usability issues. For example, it’s really difficult to build WHERE clauses.

I remember running into max amount of arel node errors because instead of WHERE some_id IN (?), I had to do WHERE (some_id = ? AND some_other_id = ?) OR (some_id = ? AND some_other_id = ?) over and over and over.

Similarly, it’s very hard to build subselects using composite primary keys. While I do think bringing it into core would make it easier to maintain, and help out people running legacy apps. I personally don’t think people should be choosing to use composite primary keys on new tables though, especially when (given your example), you could just have an auto-incrementing id and a unique index on project_id, customer_id.

1 Like

especially when (given your example), you could just have an auto-incrementing id and a unique index on project_id, customer_id .

That’s exactly what we have, actually (I updated the post to hopefully make this more clear). Perhaps I’m wrong that we need CPK to make the has_one :project_customer, foreign_key: [:project_id, :customer_id], primary_key: [:project_id, :customer_id] work as a first-class relation, which can be eager-loaded?

No, you’re right, there’s no way to do it as a has_one. Though realistically, it should be a belongs_to, right?

Is ProjectCustomer optional for a Job? It feels like you want Job belongs_to :project_customer, and then you put a project_customer_id on your Job, which you can ensure is set, instead of having a project_id and customer_id on the job.

A belongs_to association can be guaranteed to return 1 row. A has_one does not have that guarantee.

The added bonus to this approach is, there’s otherwise not a great way to ensure that the project_id and customer_id combination is a valid one. I guess you can do an EXISTS query on insert of a Job. But you also need to be mindful of deletes from the join table, that they don’t go orphaning jobs.

By using the join table id on your Job, you don’t have to worry about a Job that has a project_id that exists in the projects table and a customer_id exists in the customers table, but there’s no corresponding row in ProjectCustomer for that combination.

We have a lot of use-cases where composite primary keys are exceptionally useful – in ecommerce we have to import significant amounts of data from external systems-of-record, so we’re dealing with keys that are generated in external systems then imported, typically in massive batches.

As our platform is multi-tenant, these keys must be uniquely scoped to the tenant to avoid clashes, making our primary keys effectively a composite of tenant_reference, external_reference.

This isn’t so much a problem when it’s a single table, for example a list of products, because as @natematykiewicz suggested: you could just have a standard auto-generated primary key and leave tenant_reference, external_reference as two additional fields with a compound unique index.

But when you’re importing an association (of which we have a lot), for example product images, it’s far more performant and easier to simply be able to upsert based on the external keys (e.g. each row being tenant_reference, product_external_reference, image_url), rather than having to lookup what the internal product_id is with a join then inserting each row as product_id, image_url.

By default we’re isolating ourselves to only operating on a single account at a time, so our queries are more simple than Nate’s example:

WHERE tenant_reference = ? AND external_reference IN (?, ?, ...)

(not suggesting Nate’s example isn’t valid, just adding our use-case)

FWIW, with Postgres, you can also simplify to:

WHERE (some_id, some_other_id) IN ((?, ?), (?, ?), ...)
2 Likes

Hey folks :wave: I’m genuinely glad this discussion is happening I wanted to share that at Shopify we are working on proposing features that will help Rails natively support composite primary keys. Can’t speak on behalf the Rails core team, but we will do our best to make those proposals as solid as possible to Rails core team agrees to merge them.

I’ll shamelessly plug our reasoning and performance gains from using composite primary keys on the db-level - How to Introduce Composite Primary Keys in Rails But @ryantownsend provided great insight, essentially any multi-tenant application can potentially benefit from having a tenant_id, id composite primary key. Enforcing most of database queries to a tenant-related table makes it easier to prevent cross-tenant records access. Unless you are performing some kind of a maintenance work, you would most likely want to ensure that the records loaded belong to the “current tenant” within a tenant-scoped request.

We don’t use the composite_primary_keys gem as our use-cases are fairly simple - we maintain the uniqueness of the id column which allows us to select/update/delete records by id only, we don’t have complex associations and the ones we have just work because of the id uniqueness and we also perform most of our queries per single tenant so row-constructor is not needed. But we would still prefer updating our updates like order.touch to include composite primary key clause to avoid gap locks as with InnoDB locking, if a record is being updated at the end of an index (i.e. the largest value) and is not using the PRIMARY KEY or a UNIQUE index, it will block any attempt to insert a record after it for the duration of the transaction containing the UPDATE .

https://dev.mysql.com/doc/refman/5.7/en/innodb-locking.html#innodb-next-key-locks

Most of the features we are about to work on won’t even explicitly be related to the composite primary keys but they will provide most of the foundations for the feature. For example row constructor was already proposed to be supported but the only concern was the proposed API. Generally it doesn’t seem that the core team was against the feature. As was mentioned in the PR, presumably an API like Model.where([:tenant_id, :id] => [[1,1], [1,2], [2,3]]) would be a better option. Or, going further and being composite primary key specific, perhaps a helper like Model.primary_key_scope([[1,1], [1,2], [2,3]])) would be a decent higher-level addition to the feature

Also some parts of Rails are almost ready to support composite keys. For example _primary_key_constraints_hash provides solid foundation for making sure that single-object methods like save update update_columns delete destroy reload will use all parts of the composite primary keys in the sql clause

Just wanted to point this out but please don’t monkeypatch it as it’s a private method and Rails can’t guarantee it will always exist :upside_down_face:

To summarize, we are looking into adding composite primary keys support to Rails and hoping that the feature will get accepted. So we welcome all kinds of feedback/questions/collaborations. Thanks! :heart:

3 Likes

Well, once there’s a PR, I’d be happy to review and collaborate on it. :slight_smile:

We’re also delighted to see the framework going in this direction – we’ve got a health app and have chosen Postgres with multiple schemas for the multitenancy, one schema per tenant. This choice was made partially due to seeing the composite key story not being well-established in Rails. (And also requiring some pretty strict autonomy between tenants in order to be compliant with industry regulations.)

Will be great to explore other multitenant options as composite keys matures, so in the same way you indicate, we’re eager to help out. (And intrigued to see what snags might surface around polymorphic associations!)

I often come across this when I’m making utility data applications for ETL and data management that leverage data from separate, external data sources that aren’t explicitly dependent, but just happen to exist in an associative way. For example, product pricing records that include source_website and zip_code, and an external store location list that could include the same. It saves the need to wash records through an extra ETL step before loading it into an application for analysis. Rails usually makes these types of tasks very trivial, but sometimes the lack of being able to do more complex joins for relations can create surprisingly robust rabbit holes.

For what it’s worth, I hadn’t used the gem composite_primary_keys → I’ll have to check it out.

Another example where this can be useful is partitioned tables in Postgres. Basically, in order to have a primary key (or any unique constraint really) on a partitioned table, the partition key (eg tenant_id) must be a part of that key. Seems to me like adding composite keys in ActiveRecord would also cover this use case.

This is one of the reasons we’re prioritizing this project at Shopify. The idea was that we can add composite keys for “real” composite key use cases but that it will also be useful for “virtual” composite keys when we need to include a tenant_id in queries for sharding. My hope is that this will improve Rails integration for Vitess and any system that uses partition key based sharding.

3 Likes

Hi all,

Having developed/supported composite primary keys for over ten years, I have a pretty good idea of what is needed to natively support them in Rails. If you look through the CPK gem, you’ll see it has to patch quite a fair number of places in ActiveRecord, but most of the patches are only a few lines of code. The trickiest parts are in associations/relations.

It would be great to see the work done in the CPK gem merged into ActiveRecord. One fundamental change which would greatly simplify the code base is for ActiveRecord to always treat primary keys as an array even if there is only one of them.

Anyway, happy to walk anyone through the CPK code base as needed.

Charlie

5 Likes

Agree wholeheartedly!

Have a ton of appreciation for CPK – how seamlessly it works! Been using it for about 5 years now, and this past year dug into the innards quite a bit more while building The Brick. Thank you very much for bringing this to the Rails community.

Seems support is even closer now that “auto populated columns” on insert has landed Assign auto populated columns on Active Record object creation #48241