If I have the following markup:
<% form_tag do %> <%=text_field_tag 'html', params[:html]%> <%=submit_tag 'Update'%> <% end %>
and input "©" and submit. The form when it comes back up will have the actual copyright symbol in the text field instead of the characters "©" that were submitted.
This is due to the fact that text_field_tag[1] relies on tag_options[2] which calls escape_once[3] which is designed to escape the input but revert any secondary escape.
I am guessing Rails is trying to be helpful in the case where content gets double-encoded but what about when I want to keep the entity reference the user typed in.
A use case might be a WYSIWYG field that automatically inserts the entity references for the user (such as TinyMCE[4]). The WYSIWYG editor insert © but when the same content is redisplayed it will have the literal copyright symbol instead of the entity reference. TinyMCE seems to deal with this fine but others might not. Also some of these character references (such as   that might be generated from copy/paste data from Word) get messed up as they are saved and pulled from the database (am am guessing because something is the stack isn't Unicode compatible). So the ends result is the user gets all sorts of strange characters appearing on the screen.
So my question is what are the solutions. It seems the problem is this "help" Rails is trying to provide. If something is encoding twice it should be fixed. Rails should not try to help out. But there is probably lots of code that now depends on this "help".
I could try to find what in the stack is not Unicode compatible but that could be difficult and/or impractical as it may be the client browser (or maybe the database server)!
I have created the tags by hand (i.e. didn't use the helpers) but I sure don't want to do this for all text fields that might accept entity references.
Also triple encoding the data seems to work. So:
<% form_tag do %> <%=text_field_tag 'html', h(params[:html])%> <%=submit_tag 'Update'%> <% end %>
But that seems tedious and error prone to do on every field that might get an entity reference. I thought about overriding the behavior of tag_options to always just escape or always triple encoding before calling encode_once but what will that break?
Am I missing something. This seems like a big deal but I couldn't find much on the web about it. Should I submit a ticket in Trac?
Any insight would be appreciated.
Eric
1.http://www.railsmanual.org/module/ActionView%3A%3AHelpers%3A%3AFormTagHelper/text_field_tag 2.http://www.railsmanual.org/module/ActionView%3A%3AHelpers%3A%3ATagHelper/tag_options 3.http://www.railsmanual.org/module/ActionView%3A%3AHelpers%3A%3ATagHelper/escape_once 4.http://tinymce.moxiecode.com