Generate a unique identifier number

Hello,

I am working on an application where I need to generate a unique identifier for a specific user. I need to generate this number based on an existing algorithm. The algorithm by itself, is not the problem. The problem is the approach I am taking to generate this number.

One of the requirements of the algorithm is that part of this number must be sequencial. For example: 9000018, 9000028. First and last digit are calculated somehow using the algorithm, and in the middle, there is the sequencial number. So, I’ve been considering the following options:

a) Since all models uses UUID for primary keys, my first option was to create another field SERIAL type, and on fly, apply the algorithm to get the number. But this does not feel right to me, since it can easily become expensive operation, especially if I am returning a lot os records.

b) The second option is to generate the new number based on the last record. I mean, before save, I get the last record number, extract the sequencial part, increment it and save the new number. But this easily introduces the race condition problem.

c) So, I ended up with the third option, which also feels strange, but couldn’t find something better: Basically, I have this extra SERIAL field, I save the record, and update with the negated number. It feels strange because to create a record, I need 2 operations: Create and Update.

What is the better way to deal with this in Rails?

Sounds like this would be good as more of a database thing than a Rails thing really – if this is Postgres then what about:

  • Keep the SERIAL column. Let’s call it uniq_num.
  • Have Postgres perform an algorithm to combine the number, and put it into what is known as a “generated column”. In this case, let’s call it uniq_ident.

In Rails 7.0 and later there is support for these generated columns in Postgres. Here’s a sample migration to add this kind of new column to an existing table called users:

class AddUniqIdentToUsers < ActiveRecord::Migration[7.1]
  def change
    # Note that the number 5 here indicates the number of zero-padded digits in the middle number part
    add_column :users, :uniq_ident, :bigint, as: "('9' || LPAD(uniq_num::varchar, 5, '0') || '8')::bigint",
                                             stored: true
  end
end

When creating the column, the resulting SQL will include:

GENERATED ALWAYS AS (('9' || LPAD(uniq_num::varchar, 5, '0') || '8')::bigint) STORED.

And miraculously, existing rows will be properly updated based on the algorithm!

You can also apply an index to the column if you need to search based on this uniq_ident.

1 Like

Hi @Lorin_Thwaits

Thank you very much for your response. At some point I thought about solving this in the database level, I just didn’t go deep with that possibility.

Because I am not an expert on PostgreSQL and I am not sure how to apply the algorithm to generate the number 8 at the end, which is a checksum number. The number 9 is also a generated number, but it’s easy to get it. The problem is with the checksum number.

I will try to investigate how easy to apply the algorithm I need using postgresql and see the result.

EDIT 1:

Hum, Look: PostgreSQL support functions and languages. I can use PL/pgSQL language to create this function and use the function when creating the generated column.

1 Like

I made it.

First I created a function to calculate a check digit:

class CreateFunctionCalculateCheckDigit < ActiveRecord::Migration[7.1]
  def up # rubocop:disable Metrics/MethodLength
    safety_assured do
      execute <<-SQL.squish
        CREATE OR REPLACE FUNCTION calculate_check_digit(string_num VARCHAR) RETURNS INTEGER AS $$
          --- ...
     
        $$ LANGUAGE plpgsql IMMUTABLE;
      SQL
    end
  end

  def down
    execute 'DROP FUNCTION calculate_check_digit(string_num VARCHAR)'
  end
end

And then the migration with the generated column: Thanks @Lorin_Thwaits

class CreateUsers < ActiveRecord::Migration[7.1]
  def up
    create_table :users, id: :uuid do |t|
      t.serial :seq_number, null: false
      t.virtual :number, type: :string,
                         as: '(seq_number::TEXT || calculate_check_digit(seq_number::TEXT))::TEXT', stored: true
      # ...

      t.timestamps
    end

    safety_assured do
      execute <<-SQL.squish
        ALTER SEQUENCE users_seq_number_seq RESTART WITH 10000001;
      SQL
    end

    add_index :users, :number, unique: true
    # ...
  end

  def down
    drop_table :users
  end
end

What do you think?

1 Like

Basically, you want to implement a custom auto-increment.

This requirement comes up a lot for Enterprise applications. If you work on ones, you are expected to have this in the project.

The way I did it is to implement a counting row in another key-value-liked table.

You need this feature:

This is just to implement counting number generator that robust to race conditions.

Also grok retry if you never heard about it yet: exceptions - Documentation for Ruby 3.2

Wow, @abc I think that’s awesome!

This is not about the implementation but the “algorithm” as I understand it.

It is important with IDs to understand relative to what scope they are unique. Or more precisely how big the risk of collision is. In that respect using an auto-increment has a very low uniqueness-factor.

You will have the first collision between environments very quickly. Whether that is going to materialize as a problem remains to be seen. I would certainly recommend to have an additional identifier for all environments but PROD. Otherwise the same IDs will exist on DEV, TEST, and PROD and that is usually a problem.

Or what about splitting the database for scalability? What about disaster recovery, when you to install a fresh database? Yes, this is possible to handle, but it adds (usually avoidable) complexity.

Operations in general will be more difficult.

The next problem is with data migration. Unless you can be 100% certain that this will never happen, think about merging two systems where e.g. customers have the same internal ID. There is a reason that RDBMS use considerably more complex approaches to the internal IDs of the rows. I have once come across an application that used a substring of the Oracle row ID as the business-level UUID. The substring chosen was indeed unique for Oracle 7. Guess what happened when they had to migrate to Oracle 8 (yes, this was more than 20 years ago).

What I am trying to say, and I apologize for wasting your time if you know all this already, is that a UUID is a complex thing on the business level. Auto-increments have a number of limitations to their uniqueness and in my opinion should be avoided wherever possible.