Hi    Google Code Archive - Long-term storage for Google Code Project Hosting.

   I have found this rails plugin which automatically removes XSS from models upon saving. This is great. My concern is, which is the best choice, 1) use plugin like this Or 2) allow the content to be entered in to db as it is and later escape it from view using h method or sanitize . Why I am asking this is , the latest railscast 204 says rails3 automatically sanitize html. But why cant use this type of plugin for not at all entering such malicious user inputs to the database? Please share your thoughts

Thanks Tom

Hi    No comments yet!


I agree; it has never made sense to me to have to sanitize the output.

Escaping everything as you display it does have the benefit of allowing you to see what information is in the DB. Also you can change which tags are allowed after the fact by using sanitize() instead of h()

The downside is that you have to escape it every time you display the page. Granted this isn’t a heavy operation, but it does happen repeatedly. It seems to me that if you are always going to have to use h() anyway, things should just be sanitized before insertion into the DB and forgo the h().

Just my opinion. I still use h() and sanitize()

Two problems with that:

The first and smallest is an annoyance. If I want to save my blog in a db, and I write a post that has the content:   "Never use '<' in your HTML; use '&lt;' instead" ...this will get written to the DB as:   "Never use '&gt;' in your HTML; use '&gt;' instead" ...which then gets encoded with h() in a view as:   "Never use '&amp;gt;' in your HTML; use '&amp;gt;' instead" ...or if just output straight to the view "because it was sanitized before putting it in the DB" as:   "Never use '<' in your HTML; use '<' instead"

You'll have seen this happen on *loads* of bulletin boards and feedback comments all over the web.

Adjusting the user's input before storing it in the db is "bad", because you can never reverse it without all sorts of unreliable hoops. Just store what they typed, and whenever you deal with it assume it's highly-toxic.

The second problem is an arrogant presumption that the only place that will ever use this user-supplied data is in the rendering of an HTML page. But what happens when you're storing details, say of an order placed, and the user enters their special delivery comments :   "Please knock & wait for >5mins" You store this as:   "Please knock &amp; wait for &lt;5mins" ...because you *know* you're going to have to display it in a confirmation page on the web site and you don't want to worry about encoding it there every time, but you forget that you might want it put into a PDF that's generated for the delivery driver, or use it in a JS function on the web page, or include it in a field of a CSV export. In each of these instances, you're going to have to decode it back from the "safe" HTML encoded version to the user input (I refer you to my first point; that you can not reliably do this :slight_smile: before encoding it however you need for your new use.

Life is much easier if you just store what they typed and deal with it when you use it...

Life is much easier if you just store what they typed and deal with it when you use it...

And again going through the plugin doc I found an example like

class Message < ActiveRecord::Base    xss_terminate :except => [ :body ] end

         Means we can exempt some fields from sanitization. So isn't that sufficient? Any other thoughts?


So instead of messing with *all* of the user-supplied input, you only mess with *some* of it? That won't end up in confusion for the developers trying to re-render the DB content to PDF, etc.; when some of the data renders fine, and some has to be "decoded" back to plain text (but doesn't go back to *exactly* what the user typed)...

I didn't think I was ambiguous: fiddling with users' data before you store it is going to end up in confusion and pain somewhere [1]. It's perfectly easy to assume that all DB content is taited, and treat it appropriately for whatever purpose you want to put it.

My 2p... YMMV :slight_smile:

[1] Of course, you need to "fiddle" with it to prevent SQL injection - but the end result should be that the content in the DB is exactly what the user typed even if they typed "Robert'); DROP TABLE students;--"


Excellent points.