Thanks so much for this detailed email. I think it brings the issues people have been having into sharp relief. For some background, Ruby 1.9 has historically had a number of issues that were troublesome to Rails. In at least one case (Ruby 1.9 changed constant lookup in a very confusing and backward incompatible way), we were able to lobby ruby-core to backtrack on their decision.
A very useful thing for us to determine here would be what we could do in ruby-core to simplify this issue. One possible solution we’ve discussed has been to enable (and use) a default_source_encoding, which would (at very least) pick up the default language from the environment.
I have some more comments inline.
Developer | Engine Yard
- since no one seriously considers ruby 1.9 ready for production,
nobody is going to spend time merging patches for 1.9 encoding
support, so sending patches is a waste of time
All the “points” you listed basically just repeat what you stated here in
your last observation.
Sending (quality) patches is never a waste of time. Patience is a virtue.
I completely agree with both statements.
And I also think that expecting people to prepare wonderful patches for
rails 2.3.5 (released almost 5 months ago AFAIR) without some
encouragement or directions would be a little too much. The reasons:
We are still maintaining Rails 2.3.5, and will continue to do so for the near-future. Patches that add features to 2.3.x will probably be met with serious scrutiny after 3.0 is final, but patches which fix bugs in any supported version of Ruby (including 1.9.2, once it’s released), will continue to be considered.
- Debugging which source file or line of code (part of rails or
not) emits a ASCII-8BIT string is very time consuming (since the
point of failure is very far from the cause). Without this, it is
difficult to determine if it already has a LH ticket or not.
Yes. This blows. Again, I think this comes down to a poor choice for default source encoding (ASCII-8BIT). In my opinion, ruby-core should make the default source encoding UTF-8. If this causes backward compatibility issues, they should be handled in the Ruby code that introduces the issues, and allowing the user to change the default source encoding would probably be helpful as well.
- There are already many 1.9 tickets present in 2.3.5 with no
applicable ‘solutions’- to list just some I have been bitten by
already, or stumbled upon when searching for existing
As Jeremy said, this entire process is far too error-prone. We need to work with ruby-core, before they release 1.9.2, to create a solution that doesn’t introduce this sort of problem. In my opinion (and you can quote me on this), 1.9.x is DOA until this problem is addressed in a way that does not lead to the sorts of tickets you showed above.
- 1.8.7 is recommended for Rails. That is ok. But although the
2.3.5 release notes mention 1.9, they don’t state anything about
potential UTF-8 problems with Ruby 1.9 (except for people’s
comments), nor do they suggest what to do with such problems (e.g.
‘wait until X’, ‘we are waiting for patches’, ‘send test cases’,
‘use 1.8.7’, ‘try -KU option’, 'you are on your own unless you only
use en_us’). And there is also no mention of how to report issues
effectively or which commit to use to avoid reporting something
already on LH.
I agree. I’d also point out that in the past year, attempting to maintain compatibility with 1.9.x has been extremely frustrating for Rails. In addition to feature problems (encodings, constant lookup), we’ve been met with repeated segfaults in both 1.9.1 and 1.9.2-*. Tracking down segfaults is tricky, and while rails-core needs to attempt to keep up with 1.9.2-head, you as a user should not be using a version of Ruby that is known to segfault in pure-ruby code. To be clear, you may have never encountered any segfaults, but we encounter them often when running the Rails test suite. Note that Rails itself is pure Ruby, and the problems we have had are invariably reproducible without any C extensions.
- When using a combination of software (cucumber, webrat, rspec) it
may be very time consuming to even determine which gem is the
cause of the problem and which ones just send the problem further
down the call stack.
Indeed. This is why the whack-a-mole solution is unacceptable. At this point, we’ve clearly demonstrated that the basic strategy of making String literals in Ruby source files 8-bit-ASCII and providing no mechanism (except file-for-file magic comments) is too unwieldy.
- It is unreasonable to expect people to not try Rails with Ruby
1.9(even if by accident) and the worst thing is that is seems to
work, until UTF8 characters are used somewhere (template, db, etc).
No warning is given if Ruby 1.9 is used. So the natural thing to
assume when something is that one’s setup is wrong. Which is true -
it’s using Ruby 1.9 in the first place.
I agree. That said, I would personally not run a production Rails application on Ruby 1.9.x until 1.9.2 is released and all known issues (especially the segfaults I mentioned above) are resolved. One thing that would make me feel more comfortable would be if ruby-core ran the Rails test suite against 1.9.2-head. I know they’re not obligated to do so, but it would make the process significantly more robust. Rails core (and specifically Carl and I) would happily invest whatever time needed to help the Ruby core team get (and stay) up and running with the Rails suite.
- Although I don’t want absolute morons to use Rails, having no
‘fail-safe’ or warning will just scare good developers from Rails
just wanting to try out the framework, even if the issues are not
Rails bugs. There is no ‘recommended’ set of patches to apply and
test before reporting bugs with Ruby 1.9.
Agreed. And to be clear, I don’t see any reason that someone who’s using PHP today shouldn’t be able to use Rails tomorrow.
- Most of the solutions you find for encoding problems with ROR and
Ruby 1.9 do not suggest the following: stick with 1.8, because
1.9 with Rails is a can of worms in this regard.
That is the recommended solution.
I was wondering if this isn’t really something more suitable for
ruby-core: it would be nice to know where the string causing the error
was created and why a given encoding was selected. This could at least
provide bug reports with better details regarding the root cause.
Tracking the origin of every String might be expensive. Perhaps a debug mode that did this would be helpful. That said, as I said above, I don’t believe that ASCII-8BIT is a good default for source files.
I am really not the brightest developer out there and I apologize for
not being able to propose something more useful than just stating
Your ability to clearly articulate the problems puts you head and shoulders above most developers. Thank you very much for your efforts in clearly outlining the issues.
My question is: how can I help in a meaningful way that isn’t a
complete waste of my time and that isn’t a duplication of other
Since patches are never a waste of time, I propose the following?
My first patch would be a warning about using Ruby 1.9 with Rails. To
save people grief when they install Ruby 1.9 as their default.
That seems good. Would it be a warning in the initial Rails boot check (the one that blocks running Rails with 1.8.6 and below). That seem like the right place to me. We should perhaps have a more expansive explanation of the issues with 1.9 and encodings (possibly a guide) that we could link to.
My second patch is to rescue an exception in concat (output_safety),
work around it with force_encoding if it is sane and issue a warning.
Just to try help solve other issues that just seem related.
I’d want to see a log warning, in red, not just a Ruby warning that could be hidden. I’d like to discuss applying this solution to master as well. Would you mind hitting me up on GTalk (email@example.com).
Then I would put my efforts into discussing the issue on ruby-core
if it would be possible to add location info (and reason for selected
encoding: env, locale, magic, param, etc) for string creation on a
test version of Rails - this may save many tens of thousands of man
hours that would be wasted on debugging and help in the adoption of
not only Ruby 1.9, but in good practices regarding supporting non-US
languages in other gems.
I agree entirely. I will be happy to help lobby for these (or related) changes. Do you think it makes sense to change the default source encoding?
Then I would build a special version of Ruby that warns whenever a
string is not created as UTF-8 and isn’t explicitly created as ASCII,
fork Rails and start adding test cases.
I’d love to help you with whichever of these efforts you think my assistance would be valuable in. Again, please ping me.
Would this really be the best approach?
It sounds on the right track
Thanks in advance.
Again, thanks for your efforts here. It’s too easy to get angry, post a rant, and just leave entirely (or privately seethe). Your post here is a model example of how I would personally like people to express their concerns about serious problems that seem to remain unaddressed (or underaddressed).