I’m getting a 500 error on my website that obviously comes from a bot. I’d like to duplicate that error so that I can try to suppress the email message that gets sent to me.
The error contains:
(ArgumentError) "invalid %-encoding
It’s in a “show” action, so it’s a GET command. I can see the URL and that URL doesn’t contain any strange characters. When I put that URL in a browser everything works.
I notice, in the error message I receive, there is a bunch of non-ascii text, and embedded in it is “Network Solutions Certificate Authority”.
There is no indication that I can see of how that info is being sent. Is that in a cookie? Is there any other mechanism that a client can sent info to the server?
NOTE: This is NOT an https site.
As a last resort, I could suppress all “invalid %-encoding” errors, but I would like to see that error if it really came from a real person.
I guess another approach would be to suppress all errors from non-humans, but I’m not sure how to do that.
And ultimately, I’m curious about exactly what is being sent to the server. I want to understand that.
I've been seeing a lot of these lately, all from this user-agent:
Mozilla/5.0 (compatible; EasouSpider; +http://www.easou.com/search/spider.html)
from the following IP:183.60.214.126 (China Telecom block)
The problem is it's a GET request with a content-body, which is not
strictly prohibited by the RFCs, but not technically supported either.
If your exception notifier provides it, look at the value of
'rack.request.form_vars'
where you'll see what appears to be a binary cert file's contents.
Regardless, it seems like this spider is either seriously broken, or
actively hostile. I'm thinking about a Rack filter to drop any GET
request with a content-length header or a non-empty body, but the
quickest fix is to use iptables to block this thing altogether
Thanks. I see that the sender’s IP always starts with 183.60.x.x with the third number between 213 and 216.
I could just block those addresses and kick the can down the road.
If I could duplicate what the bot is sending then I could take a stab at the rack filter. It seems like I should be able to do that with curl. I’ll post if my experiments look useful, but if anyone has already figured it out, please post.