Announcing data_checks - a gem that helps you run regression tests on your data

Hello everyone :wave:

I’m publishing a new gem today - https://github.com/fatkodima/data_checks.

Motivation

Making sure that data stays valid is not a trivial task. For simple requirements, like “this column is not null” or “this column is unique”, you of course just use the database constraints and that’s it. Same goes for type validation or reference integrity.

However, when you want to check for something more complex, then it all changes. Depending on your DBMS, you can use stored procedures, but this is often harder to write, version and maintain.

You could also assume that your data will never get corrupted, and validations directly in the code can do the trick … but that’d be way too optimistic. Bugs happen all the time, and it’s best to plan for the worst.

This gem doesn’t aim to replace those tools, but provides something else that could serve a close purpose: ensure that you work with the data you expect.

This gem help you to schedule some verifications on your data and get alerts when something is unexpected.

Usage

A small DSL is provided to help express predicates and an easy way to configure notifications.

You will be notified when a check starts failing, and when it starts passing again.

For example, we expect every image attachment to have previews in 3 sizes. It is possible, that when a new image was attached, some previews were not generated because of some failure. What we would like to ensure is that no image ends up without a full set of previews. We could write something like:

DataChecks.configure do
  ensure_no :images_without_previews, tag: "hourly" do
    Attachment.images.joins(:previews).having("COUNT(*) < 3").group(:attachment_id)
  end

  notifier :email,
    from: "production@company.com",
    to: "developer@company.com"
end
3 Likes