Site Instability Update

Published at 14:27 on 15 June 2025

This site has been stable since I implemented a stopgap measure to prevent abusive crawling from taking it down.

Alas, the abusive crawling persists. It is happening in spite of robots.txt directing robots to avoid the troublesome URL. The robots in question also, contrary to best practice, contain no user agent information identifying themselves. Pure sleaze. I suspect one or more AI firms to be behind it, trolling the Internet for code to train their models on.

So a longer-term fix has now been implemented. The offending URL now always returns and HTTP 403 (Forbidden) response, about which RFC 2616 says “The server understood the request, but is refusing to fulfill it. Authorization will not help and the request SHOULD NOT be repeated.”

The returned response contains instructions for how to edit the URL in the browser URL window so as to reach the desired service, so humans once more have access to it. Effectively, this functions as a form of CAPTCHA.

It’s done via a CGI script. It doesn’t have to be, but it was easy that way. Rename the old script to the name that now accesses the service, and install the new one as a drop-in replacement for the old one at the original address. Plus, it keeps the URL similar so it’s relatively easy for humans to edit into the form needed to access the service.

The “script” is actually a compiled C program and not a script, because it has to be. Anything more resource-intensive might cause the abusive crawlers to take my site down.

Anyhow, problem (hopefully) solved.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.