Database fields are polluted with both nil and empty '' values when submitting forms

This has drove some newcomers to our codebase nuts.

When a user submits a form and leaves certain fields blank, they get saved as empty '' strings in the DB. If the default DB value of those fields is nil, then you end up with a database that is polluted with both nil and '' values.

Later on this leads developers to write queries like where.not(field: [nil, '']) when what they really mean from a conceptual point of view is where.not(field: nil). Conversely, they have to write where(field: [nil, '']) instead of a cleaner where(field: nil).

This feels all kinds of wrong. It feels like Rails should help us avoid that problem by default.

This problem has been mentioned before and there are gems to help with this (link 1, link 2). Or maybe this is not a code problem, maybe DBs (we use Postgres) have settings to accept '' as NULL and no-op. In any case, I think Rails should help with this.

By the way, if you are using an auditing gem like paper_trail to track a history of changes, history tables also get polluted, with records looking like this:

{"title"=>[nil, ""], "company_name"=>[nil, ""]}
8 Likes

Interesting! Iā€™m not sure Iā€™ve ever seen this. I wonder whatā€™s causing the parameters to be set.

Just to make sure this wasnā€™t just our app, I made a dummy new Rails 6.0.3 app here with:

rails generate scaffold Coconut name:string diameter:integer weight:integer

Then I created a new rew record by leaving the name field emtpy:

And what I get in the DB is an empty string:

I was able to repro both for the default SQLite DB, as well as Postgres.

For a brand new app like this with scaffolded forms that cover all fields, all new Coconut records will have an empty string, so thatā€™s nice because all your records are consistently created with an empty '' field.

But for any app that goes beyond rails scaffolds where forms donā€™t cover all fields, for example if a record is updated across several forms that a user may not necessarily fill out completely, or with an app with different ways of creating a record (form 1, form 2, background jobs, etc), you will often end up with a mix of empty '' and nil fields.

Thus leading to a ā€œWTFā€ situation where you have to do where(name: [nil, '']) whenever you want to get records whose name is empty, instead of just where(name: nil).

This bugs us too. We handle it with the Attributes API by overriding ActiveModel::Type::String for each app. Our custom String type behaves like the base by default but accepts a few extra options: strip: false, squish: false, nilify_blank: false.

So we explicitly declare a lot of attributes in our models to exercise these options:

class User < ApplicationRecord
  attribute :first_name, :string, squish: true
  attribute :last_name, :string, squish: true
  attribute :email, :string, squish: true
end

We mostly use squish: true to help folks copy-pasting, like an email into the login form. Sometimes we use nillify_blank: true to keep ā€œā€ values from complicating a query.

5 Likes

So, on the one hand, I think this is a really important issue.

On the other hand, I think itā€™s a potentially difficult one. From the perspective of the form itself, itā€™s hard to distinguish between the nil state and the empty-string state. In particular thereā€™s a huge semantic difference between omitting a field and blanking a field out.

I think my ideal solution would be to advocate for blank strings to be truncated to nil by default, but that seems aggro enough that I worry about it breaking someoneā€™s use case.

I like @ansonhoytā€™s squish: solution. Anson, how comfortable would you feel about upstreaming that or extracting it as a gem?

2 Likes

@ansonhoyt Really love your solution! If you donā€™t mind can you share you monkey patch?

Happy to share. Not in a gem ATM, but this type file buried in our humble Rails Application Template.

If youā€™re familiar with ActiveModel Attributes, we simply added our options on top of the original.

  • Start with the built-in :string type, ActiveModel::Type::String
  • Extend this with our Type::String which just accepts the extra options and applies them in serialize and deserialize (via cast_value).
  • Register our type as an override in your config/initializers/types.rb. Notice we override twice to use it in both ActiveRecord models and ActiveModel models.

Iā€™d suggest a different implementation for upstreaming. I donā€™t mind munging whitespace during deserialize, but that might not sit well with others. Even lightly touching DB values on load is scary :scream:

Iā€™m a big fan of @Sean_Griffinā€™s work giving Railsā€™ a single source of truth for how each type of attribute is serialized and deserialized. Custom types work great for handling localized dates in form inputs too!

4 Likes

I wonder if this could also be a general setting for all ApplicationRecord models in an initializer or in the model itself:

class ApplicationRecord < ActiveRecord::Base
  self.abstract_class = true

  all_attributes :string, squish: true
end
3 Likes

Oh man! This exact thing has driven me crazy for the past few months. Iā€™ve coded a couple of new rails apps and had the same issue in both. I thought it was me doing something wrong because Iā€™m using some models that include ActiveModel::Model And the Attributes API.

Iā€™m so glad this is flagged as a WTF!

IMO, empty fields should be converted to nil. I havenā€™t been able to imagine a case where I would want that distinction between nil and ā€œā€. Can we somehow ask the community? Maybe a poll somewhere?

2 Likes

not sure if this applies, but i have had the need to distinguish this: a rails API connecting to a database that was once used for a legacy PHP app, and there was a column forā€¦ some user configuration that I canā€™t remember right now, wherein NULL meant ā€œi have not touched this, i want the defaultā€ (yeah, they didnā€™t use in-database default values) and an empty string meant that the user wanted to use, well, an empty string for said configuration.

so thereā€™s One use case, but i think itā€™s weird and small enough that it can be mostly ignored and it think itā€™s a total non-issue if we use @ansonhoytā€™s solution which i do like.

1 Like

In cases where I donā€™t need to support having nil in the field, I just add default: '' in the migration when creating the database column.

2 Likes

Multi step forms where nil means the user hasnā€™t reached this step yet, '' means the user has reached this step but chose to leave the field blank.

2 Likes

these kind of multi step forms always seemed weird to me. multi-step forms in my experience are a front-end concern entirely, the step separation is just better UX.

is there any example of a multi-step form that requires putting things into the database after each step instead of sending the aggregated data from all steps at the end, in bulk? because the latter seems a LOT more reasonable to me.

Sending the data all as a chunk seems like it would lessen the data integrity applications, but past a certain point of complexity it can wildly complicates them. Any chunk of a material size is going to have different people viewing different subsets of it as ā€œrequired.ā€ Leaning into that by having later processes defensively code against the possibility of incomplete data tends to work better IME.

Hereā€™s a talk I gave on the subject:

https://av.tib.eu/media/31238

and hereā€™s a paper that really influenced the talk:

https://c2.com/ppr/checks.html

Hereā€™s another way to think about this problem, one that I have built many many times. Imagine you have a large and complex form, one which might take a few tries to complete. Say, you allow this form to either save initially or to update many times with incomplete data, but then there is a different submit button to ā€œfinishā€ the form. You allow someone entering data to save and go away for a while, come back and enter more data, maybe they see a checklist of things they need to add before the record is complete from some point of view, and then once they have filled everything in, they can ā€œpublishā€ the resulting record.

This is easy to model with validations, where you have some (most) of the validations scoped on the :publish condition, so they donā€™t apply for a regular save or update.

But if there is a functional difference between saving a form with nil (meaning you havenā€™t touched it) and ā€˜ā€™ empty string (meaning thatā€™s what you meant to enter) in an attribute, now you have made it impossible for the object to figure out the authorā€™s intent. This applies particularly to picking lists, where the value being picked is optional.

Walter

2 Likes

i see, thank you! i donā€™t think iā€™ve ever seen a form as complex as thatā€”lucky me, i suppose :innocent:

I build software for construction contractors to manage building projects. Every project is governed by a contract. For simplicity, we model contracts as a single contracts table with many fields.

My clients fill in their own contract details for their projects. The contract details they need to fill in depend on the application modules they subscribe to. Eg. the Purchasing module doesnā€™t need to know the deadline for submitting progress claims.

Hence, I have multiple Contract form flows for different modules, backed by the same contracts table. Itā€™s quite important for me to be able to differentiate between

  • ā€œThis contract field has not been set up for claim administrationā€
  • ā€œThis contract field has been set up for claim administration as blankā€
1 Like

Given your description, it sounds like the nil values are from your defaulting. As for the form submitting blank values, could this be mitigated by adding a presence validation to your model/form?


Iā€™ve noticed this as well over the years when I either chose not to (or forgot) to add a presence validation.

My general strategy is to employ a form object. Iā€™m sure these go by plenty of names, but ultimately itā€™s similar to a command pattern.

In this case, itā€™s a ruby class that is initialized with parameters to render the initial form and hydrate the re-rendered form upon invalid submission. It typically implements an execute method. In most cases I have this method translate form representation to database representation ā€“ and would handle the squish there.

That said, once in a blue moon, the form you want to display maps 1-1 with your ActiveRecord model. In this case, a form object ends up being an arguably needless proxy to ActiveRecord. Depending on how you value consistency, you might want to do it anyway, but I tend to skip the ceremony and refactor if/when it gains more responsibility.

I dig the extension to the attribute API to handle the squish since itā€™s not limited to ActiveRecord. It could happily apply to form objects as well.

I agree with the @jeromedalbert that this is confusing for newcomers and a distraction for experienced programmers. It makes me happy to see it discussed.

The auto_strip_attributes gem is in my Gemfile for every Rails project I work on. It adds a a before_validation hook to sanitize data with a pretty simple api:

  # Normal usage where " aaa  bbb\t " changes to "aaa  bbb"
  auto_strip_attributes :nick, :comment

  # Squeezes spaces inside the string: "James     Bond  " => "James Bond"
  auto_strip_attributes :name, squish: true

  # Won't set to null even if string is blank. "   " => ""
  auto_strip_attributes :email, nullify: false

  # Won't set to null even if array is blank. ["   "] => []
  auto_strip_attributes :tags, nullify_array: false

  # Use with attributes that are not mapped to a column
  auto_strip_attributes :password, virtual: true

I do like the idea of getting this behavior as close to the database as possible (serialize/deserialize). before_validation can have some odd order-dependent behavior. Iā€™ve also occasionally overridden the accessors themselves (e.g. def attribute=(value)) and put the stripping in there. This is a complicated problem because of the interaction of browser/form behavior and databases at opposite ends of the stack.

I could put together a PR to upstream our extra options, though it will be several weeks before I can get to it. Probably a few changes, like not touching values during deserialize.

I donā€™t mind if someone beats me to it, and there were more good ideas here beyond that.

P.S. @bensheldon the existence of these gems is a pretty good signal defining what could be improved. We used a gem like that too. One drawback of a before_validation approach is that assignment doesnā€™t trigger it or a save without validation (e.g. update_columns). That is what pushed us to a Type since that ā€œjust worksā€ in every case.

1 Like