Regarding CSRF mitigation documentation in Rails

Hi, hope to find someone here with a good understanding of CSRF handling in Rails and willing to answer my questions :slight_smile:

I've read the security guide and couldn't find this kind of information so maybe this is something we could improve in the guides and I could send a PR once I confirm I understood the reasoning behind each implementation details.

So, what are my problems with CSRF tokens? They add complexity, prevent caching and seem unnecessary for my application. I know Rails is not designed for serving only my application but if we could improve the security guides I would be able to opt out from having a token if I could be sure my application would remain safe. Or, if I should keep using CSRF to be safe, then it would help if Rails wouldn't mask the token generating a new one on each request because that would allow the application to provide better caching.

So, while researching about CSRF it seems like enough protection to simply verify the ORIGIN or REFERER headers unless there's a security breach in the browser itself or in some browser plugin, like Flash or Java. Sometimes those headers may not be present for legit requests, but this does not seem to be the case with the main browsers if your application runs over HTTPS in a single domain. I mean, Rails won't try to protect against GET requests to HTML pages as it doesn't make sense anyway, so in that case it would be fine if those headers would be missing or coming from a separate domain.

So, as far as I can tell, checking the headers would be enough if the following conditions can be met:

- the application doesn't change any data on GET requests, following the web semantics;

- it's served over HTTPS for all requests (so that a request wouldn't be made to HTTP in the same domain which could hide the referer in some browser implementations);

- it forbids non GET requests where those headers are both missing or when they don't match a whitelist;

- the application restricts access to supported browsers;

- the supported browsers and their plugins do not allow changing those headers for cross-domain requests;

I understand the last two conditions are out of our control but the company would be able to decide whether or not they are responsible for buggy browsers or plugins. For example, if such an exploit is possible, then maybe it would also be possible that it could be used to issue a regular GET request from the other domain and extract the token from it anyway, right?

Now, even if the company decides that they want to add extra protection anyway to make it harder for such bugged browsers/plugins exploits to be successfully used, maybe it would be enough to provide the unmasked CSRF token in the original GET request. It used to be the case in Rails but it was changed to mitigate BREACH attacks. Accordingly to http://breachattack.com/ the following conditions must be met for the application to be vulnerable:

"If you have an HTTP response body that meets all the following conditions, you might be vulnerable:

1 - Your page is served with HTTP compression enabled (GZIP / DEFLATE)

2 - Your page reflects user data via query string parameters, POST...

3 - Your application page serves PII, a CSRF token, sensitive data..."

While 1 and 3 would hold true for our SPA, we serve a single page which I'd want to be cacheable with ETag. We don't reflect query string parameters in that page, so I guess we would be safe from the BREACH attack. In that case we would be able to use the unmasked CSRF token stored in session, while remaining safe, right?

I know Rails will try to add default protection that is suited to most applications but it would be helpful if it could explain better the measures it takes against the attacks in the security guide as well as explaining how to allow safe caching by telling in which conditions the CSRF token wouldn't have to be masked. I wouldn't be bothering you if the token masking didn't come with some caveat, but we actually have a trade-off here, as masking the token could affect the client-side performance by preventing proper caching (or by forcing the client to perform another XHR request to get the masked token).

I'd love to understand if there are any other reasons why the CSRF token is being masked as well as confirming it would be okay to not mask it provided the page doesn't reflect any user input. If you know those answers I'll be pretty thankful for your feedback.

Thanks,

Rodrigo.

A few extra comments to better explain it.

Hi, hope to find someone here with a good understanding of CSRF handling in Rails and willing to answer my questions :slight_smile:

I've read the security guide and couldn't find this kind of information so maybe this is something we could improve in the guides and I could send a PR once I confirm I understood the reasoning behind each implementation details.

So, what are my problems with CSRF tokens? They add complexity, prevent caching and seem unnecessary for my application. I know Rails is not designed for serving only my application but if we could improve the security guides I would be able to opt out from having a token if I could be sure my application would remain safe. Or, if I should keep using CSRF to be safe, then it would help if Rails wouldn't mask the token generating a new one on each request because that would allow the application to provide better caching.

So, while researching about CSRF it seems like enough protection to simply verify the ORIGIN or REFERER headers unless there's a security breach in the browser itself or in some browser plugin, like Flash or Java. Sometimes those headers may not be present for legit requests, but this does not seem to be the case with the main browsers if your application runs over HTTPS in a single domain. I mean, Rails won't try to protect against GET requests to HTML pages as it doesn't make sense anyway, so in that case it would be fine if those headers would be missing or coming from a separate domain.

So, as far as I can tell, checking the headers would be enough if the following conditions can be met:

- the application doesn't change any data on GET requests, following the web semantics;

- it's served over HTTPS for all requests (so that a request wouldn't be made to HTTP in the same domain which could hide the referer in some browser implementations);

Using HTTPS also means it wouldn't be possible for a proxy server to change such headers.

- it forbids non GET requests where those headers are both missing or when they don't match a whitelist;

- the application restricts access to supported browsers;

- the supported browsers and their plugins do not allow changing those headers for cross-domain requests;

I understand the last two conditions are out of our control but the company would be able to decide whether or not they are responsible for buggy browsers or plugins. For example, if such an exploit is possible, then maybe it would also be possible that it could be used to issue a regular GET request from the other domain and extract the token from it anyway, right?

Now, even if the company decides that they want to add extra protection anyway to make it harder for such bugged browsers/plugins exploits to be successfully used, maybe it would be enough to provide the unmasked CSRF token in the original GET request. It used to be the case in Rails but it was changed to mitigate BREACH attacks. Accordingly to http://breachattack.com/ the following conditions must be met for the application to be vulnerable:

"If you have an HTTP response body that meets all the following conditions, you might be vulnerable:

1 - Your page is served with HTTP compression enabled (GZIP / DEFLATE)

2 - Your page reflects user data via query string parameters, POST...

3 - Your application page serves PII, a CSRF token, sensitive data..."

While 1 and 3 would hold true for our SPA, we serve a single page which I'd want to be cacheable with ETag. We don't reflect query string parameters in that page, so I guess we would be safe from the BREACH attack. In that case we would be able to use the unmasked CSRF token stored in session, while remaining safe, right?

I know Rails will try to add default protection that is suited to most applications but it would be helpful if it could explain better the measures it takes against the attacks in the security guide as well as explaining how to allow safe caching by telling in which conditions the CSRF token wouldn't have to be masked. I wouldn't be bothering you if the token masking didn't come with some caveat, but we actually have a trade-off here, as masking the token could affect the client-side performance by preventing proper caching (or by forcing the client to perform another XHR request to get the masked token).

Actually it would be possible to cache using other techniques like if-modified-since, getting a hash of the content with the token extracted and so on, but I'd prefer to keep it simple and just use the regular ETag middleware.

Hi Rodrigo,

CSRF tokens would prevent any malicious POST or PUT requests which are not made by the user knowingly.CSRF doesnot work for get requests at all.You can turn off CSRF if there aren’t any POST request or those request wont do any damage to your application or database.

Yes, I understand, but that doesn’t answer my 2 questions. To make
it short, given the scenario I described, I’d like to know:
1 - Why wouldn’t checking Origin and Referer headers be enough to
prevent CSRF?
2 - If the page containing the CSRF token does not reflect user
input, would sending the CSRF token unmodified from the session be
enough protection?
Thanks,
Rodrigo.
P.S.: Rails doesn’t protect GET requests because such requests
shouldn’t perform any state change in the server side, not because
they are not vulnerable to CSRF. But if for some reason someone
decides to take such actions on GET requests, there’s nothing
preventing them to use CSRF tokens on such requests as well. But for
some GET request at least CSRF must be disabled so that one would be
able to get the token before issues the next requests… Also, even
if Rails would check for CSRF in GET requests it wouldn’t be able to
know which ones should be protected automatically…

Hi Rodrigo,

It sounds like you’re fairly across this problem domain. I wouldn’t consider myself an expert at this stuff, but everything you’ve said matches my understanding of when and why CSRF protection is required.

It’s worth noting that browser extensions, malicious or not, can modify the Origin and Referer headers (and CORS related headers). We do this at Paydirt to allow us to insert our time-tracker iframe into sites that normally disallow that in their CSP header.

I think probably there’s enough that outside your control (browser versions, Flash/Java vulnerabilities, browser extensions) that it would be unwise to turn off CSRF protection and just use the Origin header.

Would it be possible to cache the SPA entry point page, and return CSRF tokens when the client makes a GET request for the data payload? Or can they send a POST/PUT requests before they necessarily fetch data from the server?

Cheers,
Nick

Hi Nick, thanks for your response.

Indeed the point that this is allowed to browser extensions is already enough reason to not rely only on the Referer or Origin headers.

"it would be unwise to turn off CSRF protection" - I never suggested to turn CSRF protection off but I guess you meant to turn off the CSRF token and I agree with you :slight_smile:

Our SPA is a bit atypical. Most SPA will probably perform some XHR requests just after being loaded and it might perform pretty well except for the first uncached request. However, our application is often updated and not much accessed (a few hundred views per week) since it has a very specific target (mostly attorneys). That means many of the requests do not benefit from caching (we have a vendors JS that rarely changes, but the other JS bundle is 65.8 KB after minified and gzipped which will probably change after each new deploy). So, we try to make it perform well even for the first uncached access.

The main document is currently 10.6KB gzipped (more than a half of it is JS injected by NewRelic middleware, which I'll probably try to move elsewhere at some point since it would also invalidate the cache besides the overhead). Or maybe I'll simply disable RUM as I already created another tool which is the one I actually use to measure client-side performance. Or maybe the NewRelic middleware is behind the ETag one and it wouldn't affect caching, I'd have to check this.

The trick to speed up the first uncached request is to load all data that would be fetched with XHR directly in the document. We've done that once and the result is that the main document became too big and it wasn't cacheable and it was slowing down the download of the main page. Most of this script wouldn't change often but it would add time even to cached requests. So, I extracted those scripts into 3 separate requests, 2 of them are usually in cache and the other one is quite small. And I load them by using async scripts in the header. This way, as soon as the header is read by the browser it can start downloading the other scripts (which usually happens over HTTP/2 for most of our clients) which means they are downloaded in parallel with the main document. But one of the scripts has sensible information (the user's e-mail mostly). I could get the e-mail using a separate XHR request, but I'd prefer to keep it simple.

So, how would I serve dynamic JS from the Rails app (since they depend on the user, they don't go to the CDN either) through GET requests (since they are sourced in script tags) without making them vulnerable to CSRF? I have to include the CSRF token in the query string. But if it changes on every request to the main page it means the ETag generated by the middleware would be always different, making it uncacheable (from the ETag middleware).

That's why I'm asking if using the raw CSRF token stored in the session would be enough protection given that our document doesn't reflect user input in the query string. This way, since my understanding is that it changes to avoid BREACH attacks and since the application is not vulnerable to it if it doesn't reflect user's input, then I would be able to generate a document which could be cached to improve the performance of cacheable requests.

After seeing that a small data is sent in the main document you might think I'm not going to improve much more by making it cacheable, since mobile devices are not supported by our application and most people have enough bandwidth to download them pretty fast. However, this books explains well that the main problem for the initial load is actually latency:

I can't make the initial request closer to the users by serving them over CDN (maybe it could help in some cases if the CDN has a faster connection to our server than the user) because it can't be cached since it contains a CSRF token that depends relies on the user's session. This means a possibility to reduce latency would be to make the download size really low (after some round trips, a large bandwidth will make it fast after the latency), reducing the amount of round trips. If the main page was cacheable it would have a low latency. If I have to provide the CSRF token in that request then it will never be cacheable for the first access after a new session is created (our sessions timeout after 2 hours of inactivity, so this is often the case). So, now that I'm thinking about it, I guess that it won't help me much to use the raw token anyway. I guess I'll experiment with making the initial request cacheable and do not provide the CSRF token on it but requesting it from a XHR request from some script tag in the header. In that case, it won't matter whether the CSRF token would be masked or not, so I don't have to worry about this.

In case you are curious, for the past week, the user who loaded our application the fastest took 274ms to load it. For that request most things were cached and authentication was not involved. Usually, the user would access a link containing an encrypted token which our application uses to sign in the user. This is fast but the redirect to the main page sometimes take a very long time and I can't explain the reason. The fastest load time including this authentication phase and the redirect took 672ms. Both users from US. "That's excelent, why are you caring about performance? Are you crazy?" Well, those were the fastest access. We signed an SLA (against my will) to serve users in under 5s since the link is followed. So, I'm pretty happy if I can serve all our clients in within 5s. But 10% of our clients take longer than 5s. The worst case this week was some user from US who took 77s to load the application. Another user took 43s, another 19s, 13s, 11s, 9s and so on. For the first case, redirecting to the root path after the authentication took 73s (20ms in the Rails app as measured by nginx). I have no idea why this happens to some users and how to fix it, but let's look at other cases.

An user from India took 43s to load the application. 2s in the redirect part and, surprise, 40s for loading the main document (they always take around 20ms in the server side accordingly to NewRelic and by inspecting the nginx logs manually as well). I have no idea on why it happened since the other resources were not cached and loaded very quickly while being bigger than the main document.

There's an even stranger case in which an US user would take 25s to load one of our JS bundle from CDN, 24s to load another one and 9s to load the CSS bundle (they are loaded in parallel). In that case the client took 29s to load the full application. In the same request the main document was loaded in 66ms but the user scripts, loaded from the same app, took around 2s each to load (also in parallel). This was on IE11.

It's interesting that everyweek I get about the same distribution and about 10% of our clients take more than 5s to load the page. 60% of them load it in under 2s.

Since you work with browser extensions, maybe you'd be able to answer two questions I have about them. Are they able to read data from any domain using the user's cookies and GET requests? In that case, those extensions would be able to get the CSRF token anyway, right? The other questions regards the possibility of extensions delaying requests on purpose (maybe some antivirus extension or something like that). Would that be possible for browser extensions? Maybe those 10% of the users could be using such extensions?

I'm asking that because sometimes I see a weird behavior. For example, in a requests that took 10s to load the application, I noticed that all resources declared in the header only started downloaded immediately after the main document has been fully loaded and it took about 2.3s to load. Just after that the other requests started. And this happened in Chrome. This is not the regular behavior I see in other requests as they start before the main document is fully loaded as designed. So, is it possible that this would happen with this user because he would be using some extension or antivirus solution what could prevent those requests from happening before the main document download completed?

Here's a screenshot of our custom tool:

Green is the latency and the black bar is the download time.

If some user ever complain they are taking over 5s to load the page, would I be right in telling them that this could be caused by some browser extension? At least in cases like this. I don't know how to better investigate the long redirect time issue... But since I reduced the main document page I was able to see much better performance results in our tool. I guess that making it cacheable over separate user sessions could improve it furthermore.

Anyway, thank you a lot for your valuable insight regarding browser extensions! They also forced me to think more about the problem which made me realize that using a fixed CSRF token to help with caching wouldn't really help much as I thought initially. When I'm back to performance improvements work I'll try to change the CSRF approach by requesting it in a XHR request rather than embedding it in the main document, so that the document could be served by our CDN more effectively (since it currently depends on the user cookies/session it wouldn't be effective if enabled).

Again, thank you very much! Good night :slight_smile:

Hi Rodrigo,

No worries, happy to have been able to help.

It sounds like the performance discrepancies you are seeing are quite strange! The SLA situation sounds tricky too, as it sounds like your server is responding in a reasonable amount of time (tens to hundreds of milliseconds), but you don’t have any control over the network used to access the app, and its latency.

What kind of concurrency do you have for your app servers? The only explanation I can think of would be that slow requests are hitting the CDN, which doesn’t have a cached copy of the assets, and the CDN then has to fetch them from your app server, then serve them to the clients. This could be slow if your app server only has one process or thread, and there are several assets to be fetched in serial. It could also be slow if returning assets to the CDN is slow for some reason, like if you were accidentally compiling them on demand, rather than precompiling them. Another possibility would be if requests were being queued up at the routing layer (if you’re using Heroku or similar), and that’s where the time was being spent (rather than within your app itself).

If your assets are changing a lot, and the number of users accessing the app is quite small, adding a CDN could feasibly slow things down - if most requests to each CDN edge location
aren’t able to return a cached asset, and have to fetch it from your server, you could be adding an extra DNS lookup and SSL handshake, just to have the CDN retrieve it from your server anyway. It might be worth looking at the cache hit rate for your CDN to see how often you’re having to populate the CDN.

I’m clutching at straws here though and just throwing things out there! :slight_smile:

In answer to your questions about browser extensions, at least for Chrome they can request permission to read and modify data on all sites you visit. They can also restrict themselves to being able to do this on a particular subset of domains. There’s some more info about that here: https://developer.chrome.com/extensions/permission_warnings#warnings.

It would also be possible for an extension to be interfering with requests in a way that slowed them down. In theory, adblocking or anti-virus extensions could exhibit this kind of behaviour. I haven’t experienced it or seen it myself though.

Anyway, it sounds like a tough problem to crack - things don’t seem to be behaving as they should be. Good luck!

Nick

"it sounds like your server is responding in a reasonable amount of time (tens to hundreds of milliseconds)"

Actually, accordingly to NewRelic reports all requests to the sign-in route take less than 10ms to complete while all requests to the main document take less than 20ms, so in the worst case only 30ms would be spent in the server-side. In the fastest request this month, JavaScript processing was responsible for 223ms of the 274ms total (loading the main document took only 2ms).

"What kind of concurrency do you have for your app servers?"

For the main document almost none as most of the requests are XHR ones. The general throughput reported by NewRelic is 51 rpm (it's a bit more but I disable NewRelic for monitoring requests, since any NewRelic numbers would basically only report what happens in our custom Sensu based monitoring system), which means our servers are never under high load. Also they run in dedicated servers in collocation at Cogent in NY.

There are not too many clients using our product in a given week. It's a very specialized application towards attorneys' needs mostly and they wouldn't need to use such tool all the time but for some specific tasks which do not happen that often but when they do happen not using the tool would take them a lot of time to complete the task. Usually there are about 200 unique users using it in a given month. That's why I don't have to worry about scaling too much and focus on the client-side optimizations as well as in the network layer itself. As you can imagine, there are lots of miss in the CDN since it's not that much used and we often deploy new features (usually almost every week, some times more than once). However our CDN stats (Cloudfront) are mostly useless for US clients (which are most of them) since there are some monitoring applications (from our client) accessing the site every few minutes which are not using cache (we have set up our own monitoring scripts to use cache with PhantomJS). For other regions Cloudfront stats report that there are more hits than misses most of the time, but I agree with you that sometimes it could be slower for some users (in case their connection to Cogent would be faster than connecting to the closest CDN plus the latency between the CDN server and our server, which is not always the case I guess).

Also, usually only one of the assets is changing after each deploy while the vendors asset would rarely change so most of requests to them would be a hit and that's the biggest asset (about 60% of the total assets loaded in the initial page if I remember correctly).

Our assets are served directly by Nginx in our servers and compilation is disabled in production.

It's interesting to know that Chrome will ask for specific permissions to allow an extension to read and modify data in sites. I guess some users using some sort of anti-virus or adblocking extensions could explain some of those odd behaviors in which assets are only loaded after the full document is loaded. Since about 10% would take longer than 5s to load the application maybe it could make sense if around 10% of them would use some sort of extension like this. Even if this is not the case maybe our client could buy this argument if they ever complain about the SLA not being met for some clients, assuming they would be able to verify that somehow. :slight_smile:

Thanks again for the insights :slight_smile:

Best,
Rodrigo.