[Proposal] SQLite3 full-text-search (and other virtual tables): Do not dump shadow/virtual tables

Hi, I’d like feedback on a patch, before I open a PR:

Using the builtin full-text-search module (FTS5) in SQLite messes up schema.rb, since neither the virtual table or the shadow tables can be properly handled by the schema dumper. Running CREATE VIRTUAL TABLE test USING fts5(content) in a migration, produces the following in schema.rb:

# Could not dump table "test" because of following StandardError
#   Unknown type '' for column 'content'

# Could not dump table "test_config" because of following StandardError
#   Unknown type '' for column 'k'

# Could not dump table "test_content" because of following StandardError
#   Unknown type '' for column 'c0'

  create_table "test_data", force: :cascade do |t|
    t.binary "block"
  end

  create_table "test_docsize", force: :cascade do |t|
    t.binary "sz"
  end

# Could not dump table "test_idx" because of following StandardError
#   Unknown type '' for column 'segid'

Would you be interested in a PR that ignores virtual/shadow tables during schema dump for SQLite? I’ve already got it working locally using:

# activerecord/lib/active_record/connection_adapters/sqlite3/schema_statements.rb
# L175
def data_source_sql(name = nil, type: nil)
  scope = quoted_scope(name, type: type)
  scope[:type] ||= "'table','view'"

  sql = +"SELECT name FROM pragma_table_list WHERE schema <> 'temp'"
  sql << " AND name NOT IN ('sqlite_sequence', 'sqlite_schema')"
  sql << " AND name = #{scope[:name]}" if scope[:name]
  sql << " AND type IN (#{scope[:type]})"
  sql
end

The only tests that do no pass, are two tests that specifically check that the old query was being logged.

Best regards, Zacharias

Hi Zacharias, thanks for bring this subject up.

What you’re suggesting is probably a small improvement over what currently exists, but to me it still leaves FTS support in a very weird state.

For casual readers, I’ll first give a bit of background explanation about SQLite3’s FTS5 tables. If you create a virtual fts table “foo_fts5” in a migration:

class CreateFts5Table < ActiveRecord::Migration[7.1]
  def change
    execute 'create virtual table foo_fts5 using fts5(content)'
  end
end

and run rails db:migrate, then from the Rails console you can see the related tables of type “shadow” that are created:

> db = SQLite3::Database.open("storage/development.sqlite3")

> db.execute("select * from pragma_table_list order by name")
=>
[["main", "ar_internal_metadata", "table", 4, 0, 0],
 ["main", "foo_fts5", "virtual", 3, 0, 0],
 ["main", "foo_fts5_config", "shadow", 2, 1, 0],
 ["main", "foo_fts5_content", "shadow", 2, 0, 0],
 ["main", "foo_fts5_data", "shadow", 2, 0, 0],
 ["main", "foo_fts5_docsize", "shadow", 2, 0, 0],
 ["main", "foo_fts5_idx", "shadow", 3, 1, 0],
 ["main", "posts", "table", 6, 0, 0],
 ["main", "schema_migrations", "table", 1, 0, 0],
 ["main", "sqlite_schema", "table", 5, 0, 0],
 ["main", "sqlite_sequence", "table", 2, 0, 0],
 ["temp", "sqlite_temp_schema", "table", 5, 0, 0]]

and then schema.rb is a bit of a mess, with error messages for some of those tables (including the virtual table itself) but keeping two of the shadow tables:

ActiveRecord::Schema[7.1].define(version: 2024_07_15_205331) do
# Could not dump table "foo_fts5" because of following StandardError
#   Unknown type '' for column 'content'

# Could not dump table "foo_fts5_config" because of following StandardError
#   Unknown type '' for column 'k'

# Could not dump table "foo_fts5_content" because of following StandardError
#   Unknown type '' for column 'c0'

  create_table "foo_fts5_data", force: :cascade do |t|
    t.binary "block"
  end

  create_table "foo_fts5_docsize", force: :cascade do |t|
    t.binary "sz"
  end

# Could not dump table "foo_fts5_idx" because of following StandardError
#   Unknown type '' for column 'segid'

  create_table "posts", force: :cascade do |t|
    t.string "title"
    t.text "body"
    t.boolean "published"
    t.datetime "created_at", null: false
    t.datetime "updated_at", null: false
  end

end

And of course this puts the app into a bad state if you do rails db:drop ; rails db:schema:load because then you’ll see that those shadow tables are no longer shadow tables:

> db.execute("select * from pragma_table_list order by name")
=>
[["main", "ar_internal_metadata", "table", 4, 0, 0],
 ["main", "foo_fts5_data", "table", 2, 0, 0],
 ["main", "foo_fts5_docsize", "table", 2, 0, 0],
 ["main", "posts", "table", 6, 0, 0],
 ["main", "schema_migrations", "table", 1, 0, 0],
 ["main", "sqlite_schema", "table", 5, 0, 0],
 ["main", "sqlite_sequence", "table", 2, 0, 0],
 ["temp", "sqlite_temp_schema", "table", 5, 0, 0]]

Zacharias, what you’re suggesting would (if I’m understanding correctly) omit all of the tables from schema.rb … but in this case, the schema is still unusable because that FTS table won’t be recreated if someone runs db:schema:load. What do you think about the particular use case I’m describing? Can you say more about your workflow and how you’re handling database recreation (I assume you’re always using the migrations?)?

Or, put another way, is this change worth making even if Rails still doesn’t have first-class FTS5 support?

Hi Mike, you bring up a good point, it is a half-baked solution.

It’s only a slight improvement, but it allows me to create virtual tables in migrations without filling my schema.rb with error messages.

The proper solution would be to add support for virtual tables in the schema dumper, similarly to how enum support was added for PostgreSQL.

I have a mostly-ready patch for that as well. My initial idea was to do a follow-up PR for that, do you think it would be better to do a combined PR?

I’m curious about the patch for virtual table support … does it also support the shadow tables? Can you share what schema.rb would look like after creating a FTS5 virtual table? I’d love to understand more deeply.

It ignores the shadow tables, and only dumps the virtual table to schema.rb. This is what the patch looks like: Comparing main...sqlite-virtual-tables · zachasme/rails · GitHub

For example

CREATE VIRTUAL TABLE searchables USING fts5(content, meta UNINDEXED, tokenize = 'porter ascii');

will append to schema.rb

create_virtual_table :searchables, :fts5, ["content", "meta UNINDEXED", "tokenize='porter ascii'"]

What are your thoughts on ignoring shadow tables in schema.rb?

That patch is great! :heart: I think you should open a PR.

Do you need to specialize drop_virtual_table? In other words, is a normal drop_table sufficient for virtual tables?

When you open the PR, please make sure to explain a couple of points for the Rails maintainers who are probably not familiar with how virtual tables work:

  • explain the difference between sqlite_master and pragma_table_list so the reviewer understands why that’s being changed
  • maybe briefly explain the lifecycle of the virtual table so the reviewer understands that create_virtual_table will re-create all the shadow tables correctly in the rails db:schema:load scenario, and that drop_table or drop_virtual_table will delete the shadow tables.

I expect there will be conversation about adding a new method to the abstract adapter that’s only used by the sqlite3 adapter, but that’s exactly the right thing to engage with the core team about!

Thank you for the feedback, Mike!

I’ll look a bit more into reversing create_virtual_table, it should be possible to reverse it in a more elegant manner. Indeed, a normal drop_table is sufficient, but I was unsure how drop_table handles being fed the same arguments as create_virtual_table.

Then I’ll start writing up a PR, based on your feedback :slight_smile:

The merged PR for anyone who finds themselves here!

Super cool to see this!

@zachasme what do your model/controller interactions with a sqlite FTS column look like? Do they require injected SQL or is there an abstraction analog to pg_search you are familiar with?

1 Like

Yes it’s great to have this land in Rails!

I’ve written a blog post detailing how to add full-text search to an existing table. This is what it looks like (inspired by the Campfire source-code):

# usage
Message.search("foo OR bar").with_snippets.ranked

# model
class Message < ApplicationRecord
  scope :search, ->(query) do
    joins("join messages_search_index idx on messages.id = idx.rowid")
    .where("messages_search_index match ?", query)
  end

  scope :with_snippets, ->(**options) do
    select("messages.*")
    .select_snippet("title", 0, **options)
    .select_snippet("body", 1, **options)
  end

  scope :ranked, -> { order(:rank) }

  after_create_commit  :create_in_search_index
  after_update_commit  :update_in_search_index
  after_destroy_commit :remove_from_search_index

  def self.rebuild_search_index
    connection.execute "INSERT INTO messages_search_index(messages_search_index) VALUES('rebuild')"
  end

  private
    def self.select_snippet(column, offset, tag: "mark", omission: "…", limit: 32)
      select("snippet(messages_search_index, #{offset}, '<#{tag}>', '</#{tag}>', '#{omission}', #{limit}) AS #{column}_snippet")
    end

    def create_in_search_index
      execute_sql_with_binds "insert into messages_search_index (rowid, title, body) values (?, ?, ?)", id, title, body
    end

    def update_in_search_index
      transaction do
        remove_from_search_index
        create_in_search_index
      end
    end

    def remove_from_search_index
      execute_sql_with_binds "insert into messages_search_index (messages_search_index, rowid, title, body) values ('delete', ?, ?, ?)", id_previously_was, title_previously_was, body_previously_was
    end

    def execute_sql_with_binds(*statement)
      self.class.connection.execute self.class.sanitize_sql(statement)
    end
end