Deprecate ActiveSupport::SafeBuffer modifications

In this proposal every method which is considered as “unsafe” for ActiveSupport::SafeBuffer is going to be deprecated, both bang and non-destructive versions.

At first glance, it seems that this would violate Liskov substitution principle, because ActiveSupport::SafeBuffer is a subclass of String. However, let’s step back and remember what’s the purpose of safe strings.

Safebuffer strings were introduced in Rails 3 in order to distinguish between strings that are already sanitized, that don’t need to be sanitized and non-safe strings. Consider this ERB template:

Hello!

<%= params[:evil_input] %>

<%= link_to params[:evil_input], users_path %>

The first string is static therefore it is completely safe and shouldn’t be sanitized at all.

The second one is definitely unsafe and should be sanitized.

The third one is made by Rails helper which returns safe strings and shouldn’t be sanitized. However, parameter itself is unsafe and should be sanitized by Rails before wrapping it into .

When ERB is compiled into a Ruby method, it wraps each dynamic string into a sanitizing method call, which in turn must have a way to distinguish between second and third row. This is when Safebuffers are really used. They respond with +true+ to +html_safe?+, while other strings and objects (except numbers) respond with +false+.

So, this is completely a view concept and even could be considered as a part of ActionView private API, because nobody uses safebuffer to pass data throughout whole application. SafeBuffers were not designed to enhance Ruby String in some way that other parsing/converter libraries could benefit from. As for subclassing String, here explains that this was performance only decision (“Because SafeBuffer inherits from String, Ruby creates this wrapper extremely efficiently (just sharing the internal char * storage).”) Should we bother if some methods from String are missing then?

Back to modifications and “unsafe” methods. It turned out that if we would perform, for example, +gsub+ on already sanitized string that was wrapped into a SafeBuffer, we couldn’t call it safe anymore. This is because one could revert any changes made by a sanitizer by another series of +gsub("<", “<”)+ etc. So the decision was made to return raw string for non-destructive versions of such unsafe methods and mark SafeBuffer as non-safe for destructive (bang) ones.

As for +gsub!+ and its friends. Not only “unsafe SafeBuffer” sounds ridiculous, but this highly complicated the class implementation! Let’s look how it could look like:

module ActiveSupport #:nodoc:

class SafeBuffer < String

UNSAFE_STRING_METHODS = %w(

capitalize chomp chop delete downcase gsub lstrip next reverse rstrip

slice squeeze strip sub succ swapcase tr tr_s upcase

)

alias_method :safe_concat, :concat

class SafeConcatError < StandardError

def initialize

super ‘Could not concatenate to the buffer because it is not html safe.’

end

end

def initialize_copy(other)

raise SafeConcatError unless other.html_safe?

super

end

def clone_empty

self[0, 0]

end

def concat(value)

super(ERB.unwrapped_html_escape(value))

end

alias << concat

def prepend(value)

super(ERB.unwrapped_html_escape(value))

end

def +(other)

dup.concat(other)

end

def %(args)

case args

when Hash

escaped_args = Hash[args.map { |k,arg| [k, ERB.unwrapped_html_escape(arg)] }]

else

escaped_args = Array(args).map { |arg| ERB.unwrapped_html_escape(arg) }

end

self.class.new(super(escaped_args))

end

def html_safe?

true

end

def to_s

self

end

alias_method :html_safe, :to_s

def to_param

to_str

end

def encode_with(coder)

coder.represent_scalar nil, to_str

end

UNSAFE_STRING_METHODS.each do |unsafe_method|

if unsafe_method.respond_to?(unsafe_method)

class_eval <<-EOT, FILE, LINE + 1

def #{unsafe_method}(*args, &block)

raise NoMethodError

end

def #{unsafe_method}!(*args)

raise NoMethodError

end

EOT

end

end

end

end

Now, aint’ that nice? Here you can compare it with the current implementation. By declaring safebuffers always safe we make it easier to understand, and we have less code to perform! (see for more information about +#safe_concat+). We also always return +self+ for +html_safe+ (there were a couple rejected approaches to enhance that).

One would agree that destructive modifications is true evil, but what’s wrong with +gsub+? Well, first of all, it doesn’t work as expected. Here is a proposal to fix that, but we see that this doesn’t solve the problem entirely. Moreover, it leads to writing less performant and (possibly) vulnerable code. Consider this example code:

user_input = ERB.unwrapped_html_escape(params[:evil_input]

string_that_is_going_to_view = “

#{user_input}
”.html_safe

This code is 100% safe and efficient. Suddenly, a programmer decides to change something

patched_string = string_that_is_going_to_view.gsub(",", “.”)

Now, his “<div”> is escaped by ERB, this is no good. What’s the solution? Throw another +html_safe+ !

patched_string = string_that_is_going_to_view.gsub(",", “.”).html_safe

Wha’ts wrong with this code? First of all, it creates two SafeBuffer objects while first is disposed immediately. Here @tenderlove clearly states that this is wrong approach and every object counts! Next, it can possibly lead to XSS attack. Consider this variant of code above:

patched_string = string_that_is_going_to_view.gsub("[", “<”).gsub("]", “>”).html_safe

Now, even if +user_input+ was previously sanitized it can be turned into unsafe again.

One could ask “isn’t it a user decision to take care of such situations?” I would answer that Rails is opinionated and should force a user to write better code. When I turned deprecation warnings on all unsafe methods and run Rails test suite, I saw a lot of deprecations on my screen. There’re even specially written tests that checks if SafeBuffer works correctly with +#titleize+ and its family (which obviously modies the passed string). Not to mention that Rails views itself use +gsub!+ as well.

So, if those methods are being deprecated what should programmer do? He/she must realise that any change should be made before wrapping a string into Safebuffer. This would be fast and efficient. If it’s impossible, then call +to_str+, do a modification and instantiate a new safebuffer. No changes with current approach, except it is explicit now (so a user would feel the pain and maybe decide to rewrite the code). For a library writer (who doesn’t know beforehand what kind of data will be passed into a method) I would propose ActiveSupport to have monkeypatched +Kernel.String+ method, so it would return string from a safebuffer, otherwise call +to_s_ as usual.