Duplicating strange bot error

I’m getting a 500 error on my website that obviously comes from a bot. I’d like to duplicate that error so that I can try to suppress the email message that gets sent to me.

The error contains:

(ArgumentError) "invalid %-encoding

It’s in a “show” action, so it’s a GET command. I can see the URL and that URL doesn’t contain any strange characters. When I put that URL in a browser everything works.

I notice, in the error message I receive, there is a bunch of non-ascii text, and embedded in it is “Network Solutions Certificate Authority”.

There is no indication that I can see of how that info is being sent. Is that in a cookie? Is there any other mechanism that a client can sent info to the server?

NOTE: This is NOT an https site.

As a last resort, I could suppress all “invalid %-encoding” errors, but I would like to see that error if it really came from a real person.

I guess another approach would be to suppress all errors from non-humans, but I’m not sure how to do that.

And ultimately, I’m curious about exactly what is being sent to the server. I want to understand that.

I've been seeing a lot of these lately, all from this user-agent:
Mozilla/5.0 (compatible; EasouSpider; +http://www.easou.com/search/spider.html)
from the following IP: (China Telecom block)

The problem is it's a GET request with a content-body, which is not
strictly prohibited by the RFCs, but not technically supported either.

If your exception notifier provides it, look at the value of
where you'll see what appears to be a binary cert file's contents.

Regardless, it seems like this spider is either seriously broken, or
actively hostile. I'm thinking about a Rack filter to drop any GET
request with a content-length header or a non-empty body, but the
quickest fix is to use iptables to block this thing altogether :slight_smile:


Thanks. I see that the sender’s IP always starts with 183.60.x.x with the third number between 213 and 216.

I could just block those addresses and kick the can down the road.

If I could duplicate what the bot is sending then I could take a stab at the rack filter. It seems like I should be able to do that with curl. I’ll post if my experiments look useful, but if anyone has already figured it out, please post.