This has drove some newcomers to our codebase nuts.
When a user submits a form and leaves certain fields blank, they get saved as empty '' strings in the DB. If the default DB value of those fields is nil, then you end up with a database that is polluted with both nil and '' values.
Later on this leads developers to write queries like where.not(field: [nil, '']) when what they really mean from a conceptual point of view is where.not(field: nil). Conversely, they have to write where(field: [nil, '']) instead of a cleaner where(field: nil).
This feels all kinds of wrong. It feels like Rails should help us avoid that problem by default.
This problem has been mentioned before and there are gems to help with this (link 1, link 2). Or maybe this is not a code problem, maybe DBs (we use Postgres) have settings to accept '' as NULL and no-op. In any case, I think Rails should help with this.
By the way, if you are using an auditing gem like paper_trail to track a history of changes, history tables also get polluted, with records looking like this:
I was able to repro both for the default SQLite DB, as well as Postgres.
For a brand new app like this with scaffolded forms that cover all fields, all new Coconut records will have an empty string, so thatās nice because all your records are consistently created with an empty '' field.
But for any app that goes beyond rails scaffolds where forms donāt cover all fields, for example if a record is updated across several forms that a user may not necessarily fill out completely, or with an app with different ways of creating a record (form 1, form 2, background jobs, etc), you will often end up with a mix of empty '' and nil fields.
Thus leading to a āWTFā situation where you have to do where(name: [nil, '']) whenever you want to get records whose name is empty, instead of just where(name: nil).
This bugs us too. We handle it with the Attributes API by overriding ActiveModel::Type::String for each app. Our custom String type behaves like the base by default but accepts a few extra options: strip: false, squish: false, nilify_blank: false.
So we explicitly declare a lot of attributes in our models to exercise these options:
class User < ApplicationRecord
attribute :first_name, :string, squish: true
attribute :last_name, :string, squish: true
attribute :email, :string, squish: true
end
We mostly use squish: true to help folks copy-pasting, like an email into the login form. Sometimes we use nillify_blank: true to keep āā values from complicating a query.
So, on the one hand, I think this is a really important issue.
On the other hand, I think itās a potentially difficult one. From the perspective of the form itself, itās hard to distinguish between the nil state and the empty-string state. In particular thereās a huge semantic difference between omitting a field and blanking a field out.
I think my ideal solution would be to advocate for blank strings to be truncated to nil by default, but that seems aggro enough that I worry about it breaking someoneās use case.
I like @ansonhoytās squish: solution. Anson, how comfortable would you feel about upstreaming that or extracting it as a gem?
Extend this with our Type::String which just accepts the extra options and applies them in serialize and deserialize (via cast_value).
Register our type as an override in your config/initializers/types.rb. Notice we override twice to use it in both ActiveRecord models and ActiveModel models.
Iād suggest a different implementation for upstreaming. I donāt mind munging whitespace during deserialize, but that might not sit well with others. Even lightly touching DB values on load is scary
Iām a big fan of @Sean_Griffinās work giving Railsā a single source of truth for how each type of attribute is serialized and deserialized. Custom types work great for handling localized dates in form inputs too!
Oh man! This exact thing has driven me crazy for the past few months. Iāve coded a couple of new rails apps and had the same issue in both. I thought it was me doing something wrong because Iām using some models that include ActiveModel::Model And the Attributes API.
Iām so glad this is flagged as a WTF!
IMO, empty fields should be converted to nil. I havenāt been able to imagine a case where I would want that distinction between nil and āā. Can we somehow ask the community? Maybe a poll somewhere?
not sure if this applies, but i have had the need to distinguish this: a rails API connecting to a database that was once used for a legacy PHP app, and there was a column forā¦ some user configuration that I canāt remember right now, wherein NULL meant āi have not touched this, i want the defaultā (yeah, they didnāt use in-database default values) and an empty string meant that the user wanted to use, well, an empty string for said configuration.
so thereās One use case, but i think itās weird and small enough that it can be mostly ignored and it think itās a total non-issue if we use @ansonhoytās solution which i do like.
these kind of multi step forms always seemed weird to me. multi-step forms in my experience are a front-end concern entirely, the step separation is just better UX.
is there any example of a multi-step form that requires putting things into the database after each step instead of sending the aggregated data from all steps at the end, in bulk? because the latter seems a LOT more reasonable to me.
Sending the data all as a chunk seems like it would lessen the data integrity applications, but past a certain point of complexity it can wildly complicates them. Any chunk of a material size is going to have different people viewing different subsets of it as ārequired.ā Leaning into that by having later processes defensively code against the possibility of incomplete data tends to work better IME.
Hereās another way to think about this problem, one that I have built many many times. Imagine you have a large and complex form, one which might take a few tries to complete. Say, you allow this form to either save initially or to update many times with incomplete data, but then there is a different submit button to āfinishā the form. You allow someone entering data to save and go away for a while, come back and enter more data, maybe they see a checklist of things they need to add before the record is complete from some point of view, and then once they have filled everything in, they can āpublishā the resulting record.
This is easy to model with validations, where you have some (most) of the validations scoped on the :publish condition, so they donāt apply for a regular save or update.
But if there is a functional difference between saving a form with nil (meaning you havenāt touched it) and āā empty string (meaning thatās what you meant to enter) in an attribute, now you have made it impossible for the object to figure out the authorās intent. This applies particularly to picking lists, where the value being picked is optional.
I build software for construction contractors to manage building projects. Every project is governed by a contract. For simplicity, we model contracts as a single contracts table with many fields.
My clients fill in their own contract details for their projects. The contract details they need to fill in depend on the application modules they subscribe to. Eg. the Purchasing module doesnāt need to know the deadline for submitting progress claims.
Hence, I have multiple Contract form flows for different modules, backed by the same contracts table. Itās quite important for me to be able to differentiate between
āThis contract field has not been set up for claim administrationā
āThis contract field has been set up for claim administration as blankā
Given your description, it sounds like the nil values are from your defaulting. As for the form submitting blank values, could this be mitigated by adding a presence validation to your model/form?
Iāve noticed this as well over the years when I either chose not to (or forgot) to add a presence validation.
My general strategy is to employ a form object. Iām sure these go by plenty of names, but ultimately itās similar to a command pattern.
In this case, itās a ruby class that is initialized with parameters to render the initial form and hydrate the re-rendered form upon invalid submission. It typically implements an execute method. In most cases I have this method translate form representation to database representation ā and would handle the squish there.
That said, once in a blue moon, the form you want to display maps 1-1 with your ActiveRecord model. In this case, a form object ends up being an arguably needless proxy to ActiveRecord. Depending on how you value consistency, you might want to do it anyway, but I tend to skip the ceremony and refactor if/when it gains more responsibility.
I dig the extension to the attribute API to handle the squish since itās not limited to ActiveRecord. It could happily apply to form objects as well.
I agree with the @jeromedalbert that this is confusing for newcomers and a distraction for experienced programmers. It makes me happy to see it discussed.
The auto_strip_attributes gem is in my Gemfile for every Rails project I work on. It adds a a before_validation hook to sanitize data with a pretty simple api:
# Normal usage where " aaa bbb\t " changes to "aaa bbb"
auto_strip_attributes :nick, :comment
# Squeezes spaces inside the string: "James Bond " => "James Bond"
auto_strip_attributes :name, squish: true
# Won't set to null even if string is blank. " " => ""
auto_strip_attributes :email, nullify: false
# Won't set to null even if array is blank. [" "] => []
auto_strip_attributes :tags, nullify_array: false
# Use with attributes that are not mapped to a column
auto_strip_attributes :password, virtual: true
I do like the idea of getting this behavior as close to the database as possible (serialize/deserialize). before_validation can have some odd order-dependent behavior. Iāve also occasionally overridden the accessors themselves (e.g. def attribute=(value)) and put the stripping in there. This is a complicated problem because of the interaction of browser/form behavior and databases at opposite ends of the stack.
I could put together a PR to upstream our extra options, though it will be several weeks before I can get to it. Probably a few changes, like not touching values during deserialize.
I donāt mind if someone beats me to it, and there were more good ideas here beyond that.
P.S. @bensheldon the existence of these gems is a pretty good signal defining what could be improved. We used a gem like that too. One drawback of a before_validation approach is that assignment doesnāt trigger it or a save without validation (e.g. update_columns). That is what pushed us to a Type since that ājust worksā in every case.