I am working on an application where I need to generate a unique identifier for a specific user. I need to generate this number based on an existing algorithm. The algorithm by itself, is not the problem. The problem is the approach I am taking to generate this number.
One of the requirements of the algorithm is that part of this number must be sequencial. For example: 9000018, 9000028. First and last digit are calculated somehow using the algorithm, and in the middle, there is the sequencial number. So, I’ve been considering the following options:
a) Since all models uses UUID for primary keys, my first option was to create another field SERIAL type, and on fly, apply the algorithm to get the number. But this does not feel right to me, since it can easily become expensive operation, especially if I am returning a lot os records.
b) The second option is to generate the new number based on the last record. I mean, before save, I get the last record number, extract the sequencial part, increment it and save the new number. But this easily introduces the race condition problem.
c) So, I ended up with the third option, which also feels strange, but couldn’t find something better: Basically, I have this extra SERIAL field, I save the record, and update with the negated number. It feels strange because to create a record, I need 2 operations: Create and Update.
What is the better way to deal with this in Rails?
Sounds like this would be good as more of a database thing than a Rails thing really – if this is Postgres then what about:
Keep the SERIAL column. Let’s call it uniq_num.
Have Postgres perform an algorithm to combine the number, and put it into what is known as a “generated column”. In this case, let’s call it uniq_ident.
In Rails 7.0 and later there is support for these generated columns in Postgres. Here’s a sample migration to add this kind of new column to an existing table called users:
class AddUniqIdentToUsers < ActiveRecord::Migration[7.1]
def change
# Note that the number 5 here indicates the number of zero-padded digits in the middle number part
add_column :users, :uniq_ident, :bigint, as: "('9' || LPAD(uniq_num::varchar, 5, '0') || '8')::bigint",
stored: true
end
end
When creating the column, the resulting SQL will include:
Thank you very much for your response. At some point I thought about solving this in the database level, I just didn’t go deep with that possibility.
Because I am not an expert on PostgreSQL and I am not sure how to apply the algorithm to generate the number 8 at the end, which is a checksum number. The number 9 is also a generated number, but it’s easy to get it. The problem is with the checksum number.
I will try to investigate how easy to apply the algorithm I need using postgresql and see the result.
EDIT 1:
Hum, Look: PostgreSQL support functions and languages. I can use PL/pgSQL language to create this function and use the function when creating the generated column.
First I created a function to calculate a check digit:
class CreateFunctionCalculateCheckDigit < ActiveRecord::Migration[7.1]
def up # rubocop:disable Metrics/MethodLength
safety_assured do
execute <<-SQL.squish
CREATE OR REPLACE FUNCTION calculate_check_digit(string_num VARCHAR) RETURNS INTEGER AS $$
--- ...
$$ LANGUAGE plpgsql IMMUTABLE;
SQL
end
end
def down
execute 'DROP FUNCTION calculate_check_digit(string_num VARCHAR)'
end
end
And then the migration with the generated column: Thanks @Lorin_Thwaits
class CreateUsers < ActiveRecord::Migration[7.1]
def up
create_table :users, id: :uuid do |t|
t.serial :seq_number, null: false
t.virtual :number, type: :string,
as: '(seq_number::TEXT || calculate_check_digit(seq_number::TEXT))::TEXT', stored: true
# ...
t.timestamps
end
safety_assured do
execute <<-SQL.squish
ALTER SEQUENCE users_seq_number_seq RESTART WITH 10000001;
SQL
end
add_index :users, :number, unique: true
# ...
end
def down
drop_table :users
end
end
This is not about the implementation but the “algorithm” as I understand it.
It is important with IDs to understand relative to what scope they are unique. Or more precisely how big the risk of collision is. In that respect using an auto-increment has a very low uniqueness-factor.
You will have the first collision between environments very quickly. Whether that is going to materialize as a problem remains to be seen. I would certainly recommend to have an additional identifier for all environments but PROD. Otherwise the same IDs will exist on DEV, TEST, and PROD and that is usually a problem.
Or what about splitting the database for scalability? What about disaster recovery, when you to install a fresh database? Yes, this is possible to handle, but it adds (usually avoidable) complexity.
Operations in general will be more difficult.
The next problem is with data migration. Unless you can be 100% certain that this will never happen, think about merging two systems where e.g. customers have the same internal ID. There is a reason that RDBMS use considerably more complex approaches to the internal IDs of the rows. I have once come across an application that used a substring of the Oracle row ID as the business-level UUID. The substring chosen was indeed unique for Oracle 7. Guess what happened when they had to migrate to Oracle 8 (yes, this was more than 20 years ago).
What I am trying to say, and I apologize for wasting your time if you know all this already, is that a UUID is a complex thing on the business level. Auto-increments have a number of limitations to their uniqueness and in my opinion should be avoided wherever possible.