Hi Nick, thanks for your response.
Indeed the point that this is allowed to browser extensions is already enough reason to not rely only on the Referer or Origin headers.
"it would be unwise to turn off CSRF protection" - I never suggested to turn CSRF protection off but I guess you meant to turn off the CSRF token and I agree with you
Our SPA is a bit atypical. Most SPA will probably perform some XHR requests just after being loaded and it might perform pretty well except for the first uncached request. However, our application is often updated and not much accessed (a few hundred views per week) since it has a very specific target (mostly attorneys). That means many of the requests do not benefit from caching (we have a vendors JS that rarely changes, but the other JS bundle is 65.8 KB after minified and gzipped which will probably change after each new deploy). So, we try to make it perform well even for the first uncached access.
The main document is currently 10.6KB gzipped (more than a half of it is JS injected by NewRelic middleware, which I'll probably try to move elsewhere at some point since it would also invalidate the cache besides the overhead). Or maybe I'll simply disable RUM as I already created another tool which is the one I actually use to measure client-side performance. Or maybe the NewRelic middleware is behind the ETag one and it wouldn't affect caching, I'd have to check this.
The trick to speed up the first uncached request is to load all data that would be fetched with XHR directly in the document. We've done that once and the result is that the main document became too big and it wasn't cacheable and it was slowing down the download of the main page. Most of this script wouldn't change often but it would add time even to cached requests. So, I extracted those scripts into 3 separate requests, 2 of them are usually in cache and the other one is quite small. And I load them by using async scripts in the header. This way, as soon as the header is read by the browser it can start downloading the other scripts (which usually happens over HTTP/2 for most of our clients) which means they are downloaded in parallel with the main document. But one of the scripts has sensible information (the user's e-mail mostly). I could get the e-mail using a separate XHR request, but I'd prefer to keep it simple.
So, how would I serve dynamic JS from the Rails app (since they depend on the user, they don't go to the CDN either) through GET requests (since they are sourced in script tags) without making them vulnerable to CSRF? I have to include the CSRF token in the query string. But if it changes on every request to the main page it means the ETag generated by the middleware would be always different, making it uncacheable (from the ETag middleware).
That's why I'm asking if using the raw CSRF token stored in the session would be enough protection given that our document doesn't reflect user input in the query string. This way, since my understanding is that it changes to avoid BREACH attacks and since the application is not vulnerable to it if it doesn't reflect user's input, then I would be able to generate a document which could be cached to improve the performance of cacheable requests.
After seeing that a small data is sent in the main document you might think I'm not going to improve much more by making it cacheable, since mobile devices are not supported by our application and most people have enough bandwidth to download them pretty fast. However, this books explains well that the main problem for the initial load is actually latency:
I can't make the initial request closer to the users by serving them over CDN (maybe it could help in some cases if the CDN has a faster connection to our server than the user) because it can't be cached since it contains a CSRF token that depends relies on the user's session. This means a possibility to reduce latency would be to make the download size really low (after some round trips, a large bandwidth will make it fast after the latency), reducing the amount of round trips. If the main page was cacheable it would have a low latency. If I have to provide the CSRF token in that request then it will never be cacheable for the first access after a new session is created (our sessions timeout after 2 hours of inactivity, so this is often the case). So, now that I'm thinking about it, I guess that it won't help me much to use the raw token anyway. I guess I'll experiment with making the initial request cacheable and do not provide the CSRF token on it but requesting it from a XHR request from some script tag in the header. In that case, it won't matter whether the CSRF token would be masked or not, so I don't have to worry about this.
In case you are curious, for the past week, the user who loaded our application the fastest took 274ms to load it. For that request most things were cached and authentication was not involved. Usually, the user would access a link containing an encrypted token which our application uses to sign in the user. This is fast but the redirect to the main page sometimes take a very long time and I can't explain the reason. The fastest load time including this authentication phase and the redirect took 672ms. Both users from US. "That's excelent, why are you caring about performance? Are you crazy?" Well, those were the fastest access. We signed an SLA (against my will) to serve users in under 5s since the link is followed. So, I'm pretty happy if I can serve all our clients in within 5s. But 10% of our clients take longer than 5s. The worst case this week was some user from US who took 77s to load the application. Another user took 43s, another 19s, 13s, 11s, 9s and so on. For the first case, redirecting to the root path after the authentication took 73s (20ms in the Rails app as measured by nginx). I have no idea why this happens to some users and how to fix it, but let's look at other cases.
An user from India took 43s to load the application. 2s in the redirect part and, surprise, 40s for loading the main document (they always take around 20ms in the server side accordingly to NewRelic and by inspecting the nginx logs manually as well). I have no idea on why it happened since the other resources were not cached and loaded very quickly while being bigger than the main document.
There's an even stranger case in which an US user would take 25s to load one of our JS bundle from CDN, 24s to load another one and 9s to load the CSS bundle (they are loaded in parallel). In that case the client took 29s to load the full application. In the same request the main document was loaded in 66ms but the user scripts, loaded from the same app, took around 2s each to load (also in parallel). This was on IE11.
It's interesting that everyweek I get about the same distribution and about 10% of our clients take more than 5s to load the page. 60% of them load it in under 2s.
Since you work with browser extensions, maybe you'd be able to answer two questions I have about them. Are they able to read data from any domain using the user's cookies and GET requests? In that case, those extensions would be able to get the CSRF token anyway, right? The other questions regards the possibility of extensions delaying requests on purpose (maybe some antivirus extension or something like that). Would that be possible for browser extensions? Maybe those 10% of the users could be using such extensions?
I'm asking that because sometimes I see a weird behavior. For example, in a requests that took 10s to load the application, I noticed that all resources declared in the header only started downloaded immediately after the main document has been fully loaded and it took about 2.3s to load. Just after that the other requests started. And this happened in Chrome. This is not the regular behavior I see in other requests as they start before the main document is fully loaded as designed. So, is it possible that this would happen with this user because he would be using some extension or antivirus solution what could prevent those requests from happening before the main document download completed?
Here's a screenshot of our custom tool:
Green is the latency and the black bar is the download time.
If some user ever complain they are taking over 5s to load the page, would I be right in telling them that this could be caused by some browser extension? At least in cases like this. I don't know how to better investigate the long redirect time issue... But since I reduced the main document page I was able to see much better performance results in our tool. I guess that making it cacheable over separate user sessions could improve it furthermore.
Anyway, thank you a lot for your valuable insight regarding browser extensions! They also forced me to think more about the problem which made me realize that using a fixed CSRF token to help with caching wouldn't really help much as I thought initially. When I'm back to performance improvements work I'll try to change the CSRF approach by requesting it in a XHR request rather than embedding it in the main document, so that the document could be served by our CDN more effectively (since it currently depends on the user cookies/session it wouldn't be effective if enabled).
Again, thank you very much! Good night