coercions of blank strings to boolean and integer values

This may seem a minor issue, but I think it's worth a bit of discussion to get it right. Let me start with the use case that led me smack into the issue.

Start with some model that has a boolean field that can be nil/null. Create a view with a form that uses a select box to set the field's value, and set :include_blank => true. The obvious assumption is that if you select blank (as opposed to Yes/1/true or No/0/false), the field value will be saved as nil/null. Not so - it is saved as 0/false. This has the weird behavior of the select box showing the false choice when you select blank and get bounced back for a second try at the form.

    create_table :widgets do |t|
      t.string :name
      t.boolean :fancy
    end

    w = Widget.create(:name => "weather", :fancy => true)
    w.fancy # => true
    w.update_attributes(:fancy => nil)
    w.fancy # => nil
    w.update_attributes(:fancy => "")
    w.fancy # => false

It's not too much trouble to patch ActiveRecord to change it so that setting a boolean field to a blank string gets saved as NULL in the database instead of 0. The question is: what is the desired behavior? More specifically, when should the value be coerced, with special consideration to the case when the field doesn't allow a nil/NULL value?

I think the answer is that blank values should always be coerced to nil in memory, even when the field is created with :null => false. This is consistent with how integer fields are handled. You can have a nil value in memory, and the db will barf when you try and save it as NULL. AR expects you to use a validation to catch that and give the user another shot at the form. However... there is what looks to me like a bug in AR where assigning a blank-but-not-empty string to an integer field has a different result from assigning an empty string or nil. An empty string is coerced to nil, but a string of one or more whitespace characters is coerced to 0. It's also easy to fix it so that all blank strings are coerced to 0, and I think that would be the best thing to do. But I wonder if anyone out there might be relying on this behavior. There are only 2 tests that break when this change is made (in validations_test.rb).

By the way, strings representing integers are coerced to their correct values, but non-numeric objects like arrays and hashes are coerced to 1. WTF? Why is that useful?

I was going to create a ticket with a patch for this stuff to refer to, but the code is sitting on a computer at work so that won't happen until the morning (sorry, Pratik).

To summarize:

I propose that all blank strings should be coerced to nil, for both boolean and integer fields. Any issues with that? Anyone know if they are relying on that behavior?

The scenario that you mention is a classic one of empty strings being
returned from forms. And the problem is
bigger than just booleans. It also applies to strings, numbers and
lists.

In the scenario Josh mentions, a boolean field should have the empty
string or a blank string coerced to false (or nil) long before it gets
saved by ActiveRecord.

Putting the burden on ActiveRecord to massage the crap it is handed
into something meaningful seems out
of place. Why not fix the problem at the source and get
ActionController to return meaningful values from empty form fields?

-1 for getting ActiveRecord to bail out ActionController with coercion
of empty strings and blanks.

As for the coercion of non-numerics objects, I agree that "1" seems
totally outrageous. I would hope for the coercion to use to_i/to_f
conventions and raise an exception when they fail.

+1 for fixing/removing the coercion of non-numeric objects beyond a
simple to_i/to_f depending on the field type.

-Chris

PS - Here is some code I have in ApplicationController that attacks
the empty string problem at the source. It could be cleaned up with
"returning" and other recent goodness:

  # Coerce empty string values in hash -particularly useful for HTTP
POSTed forms. Values are coerced based on the attrs
  # mapping of params keys (attrs keys) to coerced values (attrs
values).
  # See http://dev.rubyonrails.org/ticket/5694 for background
  def coerce_empty_strings(params_hash, attrs = {})
    return unless params_hash
    params_hash.inject(HashWithIndifferentAccess.new) do |h,(k,v)|
      h[k] = (v.is_a?(String) && v.empty?) ? attrs[k.to_sym] : v
      h
    end
  end

I don't think the fault is in ActionController. The params hash should
echo whatever parameters are being passed from the browser. The
browser sends an empty string, it's ActiveRecord's job to coerce the
strings into more suitable objects depending on the attribute you're
assigning to.

I'm okay with an empty string being coerced into nil for boolean
attributes (for consistency). But I do think the :null => false
setting is much more common in boolean attributes than any other type.
Otherwise it's not a true boolean attribute (it can hold 3 states).
Too often I've been bitten by a bug where null slipped into a boolean
field and it didn't match a "false" search.

Regards,

Ryan

I agree. This follows the guideline of "validate early", which is
generally a good one.

Also, it makes sense to handle it early if you were to use another
persistence layer. This is a UI-layer problem, solve it close to the
UI.

-- Chad

I agree. This follows the guideline of "validate early", which is
generally a good one.

I also agree - ActiveRecord should not need to know about how browsers handle params hashes. For example, If you're talking to ActiveRecord from somewhere else (the console, a drb script, or whatever) you shouldn't have to pretend you're an html client

Ian

Interesting, to me it is just the opposite. This is not specific to
HTML clients and Action Controller. We're talking about how Active
Record handles empty strings. This applies to console and other
scripts too. If the client is passing empty strings in any of those
environments, Active Record should be able to coerce appropriately.

Perhaps both Action Controller and Active Record need fixing? But I
look at it as two different problems because they are completely
separate modules.

Regards,

Ryan

The scenario that you mention is a classic one of empty strings being
returned from forms. And the problem is
bigger than just booleans. It also applies to strings, numbers and
lists.

In the scenario Josh mentions, a boolean field should have the empty
string or a blank string coerced to false (or nil) long before it gets
saved by ActiveRecord.

Putting the burden on ActiveRecord to massage the crap it is handed
into something meaningful seems out
of place. Why not fix the problem at the source and get
ActionController to return meaningful values from empty form fields?

I think the problem with that approach stems from the fact that form data is submitted as untyped strings. There's no way to look at the string "1,100" and guess if that means the string itself, the integer 1100, or the list [1, 100]. Currently ActiveRecord does the best it can to convert string data from forms into appropriate values for fields, and sometimes it falls over (bug or flawed design?).

I see three paths we can take to improve things:

1) incrementally improve ActiveRecord to more sensibly process string inputs and convert to the correct data type for fields, i.e. blank string handling

2) significantly alter ActiveRecord for more flexible and targeted processing of string inputs

3) create some kind of middle-man object to assist in converting form input strings to correct data types

Path 1 seems like a good approach in the short term, and there seems to be little reason not to fix obvious errors in how ActiveRecord operates. Even if we do something else, it doesn't seem like a good idea to remove this functionality from AR, since that would break virtually *every* Rails app in existence.

Path 2 could be interesting as a generic approach. I've done exactly this in specific situations often before - e.g. I fake up a tags_list accessor on the model to allow user input of a list of tags like "rails, ruby, sighting". You can of course do this without any special support, but perhaps a bit of syntactic sugar could improve things.

Path 3 sounds great in theory. It's like a presenter that runs in reverse too. But I wonder if separating the processing of form input data into a separate object is going to be worth the effort. I'd be interested in seeing someone's proposal for what that might look like (unfortunately I have a few other science experiments higher in my own priority queue right now).

In the mean time, I propose Path 1 as the simplest thing that could possibly work to fix the use case where submitting forms with blank values gives a non-nil value in the field.

-1 for getting ActiveRecord to bail out ActionController with coercion
of empty strings and blanks.

As for the coercion of non-numerics objects, I agree that "1" seems
totally outrageous. I would hope for the coercion to use to_i/to_f
conventions and raise an exception when they fail.

That's just what it does (schema_definitions.rb:65):

     when :integer then value.to_i rescue value ? 1 : 0

I'm still puzzling over that one, especially since `value` will never be nil (thanks to a test a few lines above).

+1 for "when :integer then value.to_i"

And if I get up on the wrong side of the bed, then I might propose
"when :integer then value"

Honestly, having the persistence layer guess at what is "intended" to
be stored seems like a losing proposition. What's next? Guessing
that for a boolean field "Nein" means false and "Oui" means true and
that for an integer field that "Two" means 2 and "a lot" means 3?

ActiveRecord is doing two jobs: modelling and persisting. You could
make a case for AR modelling being a good place to address the
coercion problem... but not generically. If you want to solve the
problem in AR, then let's give the user the ability to model coercion
rules as well.

But ultimately, WHEREVER the modelling of coercion is performed, it
needs to be available right from the start of processing in the
controller action. And coercion rules/modelling needs to be available
for input data that it is not persisted (or at least not by an
ActiveRecord model).

A good solution might be a reverse presenter (to use Josh's term) that
can be modelled independently, with AR automatically generating a
default reverse presenter bound to each model class. More complicated
Reverse Presenters could accept multiple AR models and their
associations as well as manually managed attributes. A
ReversePresenter would be instantiated with the request's params at
the start of a controller action. It's all downhill from here...

Of course it's worth considering combining the Reverse Presenter and
the classic Presenter into one class. That might help with building
some enthusiasm for the idea and it would probably reduce the amount
of modeling required.

Quick thought: I don't think calling #to_i is guessing. I see it as "I
expect an integer, please provide me with whatever you can to satisfy
that". I think most Ruby APIs are like that. So I always call #to_s or
#to_i in methods that expect such types.

Just like AR will call #to_i on #find when passed a string. It's ok,
and it lets you do some more magic and makes your code look a lot
better.

Just my $0.2

Damian,
I think we are saying the same thing: using the object's own methods
(to_i, to_a, to_f, etc.) to typecast/coerce is absolutely the right
thing to do. Unfortunately, Rails currently goes beyond that level of
coercion and Josh's original proposal on this post was to go even
further to coerce booleans using external logic. It's the use of
external logic that I characterized as "guessing."

-Chris

Agreed :slight_smile:

I think we are saying the same thing: using the object's own methods
(to_i, to_a, to_f, etc.) to typecast/coerce is absolutely the right
thing to do. Unfortunately, Rails currently goes beyond that level of
coercion and Josh's original proposal on this post was to go even
further to coerce booleans using external logic. It's the use of
external logic that I characterized as "guessing."

Agreed :slight_smile:

I'd definitely love to see some work done on tidying up the conversion
code and making it more consistent and testable. In the meantime the
patch is a nice pragmatic solution until someone has the time and the
inclination to spend time doing that rework.

If you're that person, drop us a line :slight_smile: