Lunarpages Web Hosting Forum

Author Topic: Booting nuisance bots  (Read 1415 times)

Offline MrPhil

  • Senior Moderator
  • Berserker Poster
  • *****
  • Posts: 6425
Booting nuisance bots
« on: December 29, 2018, 12:35:06 PM »
I have a bot (from SemRush.com) pounding on my site every few seconds, around the clock. All they do is request my forum "stats" page, which rarely changes. I think they are causing clusters of PHP errors (in error_log) one or more times a day, simply because they're here so much.

Anyway, I tried blocking what I think is their bot, SemrushBot/3~bl, but it doesn't seem to have done any good:
Code: [Select]
RewriteCond %{HTTP_USER_AGENT}  ^SemrushBot.*
RewriteRule .  - [F,NC]

All the visitors are logged as crawl24.bl.semrush.com (and other numbers), which whois says are 104.20.124.70. I tried adding that IP address to the deny list, but it seems to be ineffective. whois says that CloudFlare is involved, so maybe they're just jumping to another IP address?

What else can I do to keep this misbehaving bot off my site? I haven't tried asking them to tone it down and visit me less often -- is that known to work? If I open a ticket, can support block them?
Visit My Site

E-mail Me
-= From the ashes shall rise a sooty tern =-

Offline Pete

  • Alien Anomaly
  • Senior Moderator
  • Professor in Nanotechnology
  • *****
  • Posts: 4246
    • X-Visions Website Design
Re: Booting nuisance bots
« Reply #1 on: December 31, 2018, 02:29:42 AM »
Did you check out the advice 'How To Block SEMrushBot From Crawling Your Site' on the Semrush website Mr Phil ?
x-visions.com


As I'm always saying.. (But nobody listens)
"Take a step back.. Take a deep breath and see if there a simple solution there, thats hiding" lol  :DLunarpages Web Hosting   Lunarpages Forums  Lunarpages Affiliate Program

Offline MrPhil

  • Senior Moderator
  • Berserker Poster
  • *****
  • Posts: 6425
Re: Booting nuisance bots
« Reply #2 on: December 31, 2018, 05:31:50 AM »
Hi Pete, glad to see you're still around!

I didn't see that article when poking around their site, but was able to get to it via Google search. I have added the lines to robots.txt, which they claim they obey, so let's see what happens (they say it may take up to two weeks). Thanks for the pointer!

According to my cPanel Error Log, no one seems to be blocked on the one IP address I added. However, tons of accesses from 46.229.168.* (a Dutch site belonging to SemRush) are being blocked, although my Visitor Log still shows me being pounded on every second or two by SemRush. In addition to the check for User Agent, I added a check for Referer "semrush.com", which seems to have triggered the Dutch site denials. Also, (fingers crossed) I am not getting clusters of PHP errors since the Dutch site was denied.

Once the robots.txt entry shows its effectiveness, I'll try removing the other .htaccess bans one at a time.

I wouldn't mind SemRush visiting me once in a while to collect stats for whatever they do, but hitting me every second or two is excessive. To rub salt into the wound, it's the same forum/stats page over and over, and it doesn't change all that much! If they had any brains, they'd notice the lack of change, and cut back on the visit rate.
Visit My Site

E-mail Me
-= From the ashes shall rise a sooty tern =-

Offline MrPhil

  • Senior Moderator
  • Berserker Poster
  • *****
  • Posts: 6425
Re: Booting nuisance bots
« Reply #3 on: January 04, 2019, 03:45:46 PM »
Per semrush.com's instructions, I added them to robots.txt. Well, semrush.com is still getting my /robots.txt every minute or so, but at least they aren't clogging up the place like they were (I can see other visitors now). I'm going to ask LP to block them entirely, as they're still a nuisance.

I'm still having brief bursts of PHP errors in my forum, so someone is still pounding on me. I'll have to try to tease out who it is from the visitor logs, when I get some time.
Visit My Site

E-mail Me
-= From the ashes shall rise a sooty tern =-

Offline MrPhil

  • Senior Moderator
  • Berserker Poster
  • *****
  • Posts: 6425
Re: Booting nuisance bots
« Reply #4 on: January 08, 2019, 06:44:55 AM »
I looked through the raw access log for yesterday, and saw that AlphaBot (from alphaseobot.com) was pounding on my forum pretty hard yesterday at the same time I got a cluster of errors in error_log. They just got the robots.txt + agent/referer deny treatment -- let's see if that helps. Note that the .htaccess agent+referer deny still permits /robots.txt to be requested, but blocks anything else. I see that despite only being allowed to request /robots.txt, SemRush.com is still requesting it every few minutes! Assholes. I'm going to open a support ticket to get them banned at a higher level (before they hit my server), if I can.
Visit My Site

E-mail Me
-= From the ashes shall rise a sooty tern =-

Offline MrPhil

  • Senior Moderator
  • Berserker Poster
  • *****
  • Posts: 6425
Re: Booting nuisance bots
« Reply #5 on: January 09, 2019, 08:11:26 AM »
I didn't get any errors logged since I kicked off AlphaBot yesterday. Fingers crossed the ban will work. I see the daily max visitors to my forum count has dropped by about 2/3.

I went ahead and opened a ticket suggesting that LP block ill-mannered robots like SemrushBot and AlphaBot at high level, as handling them is causing unnecessary load on servers and internal networks, and may even be causing accounts to be suspended/sandboxed, through no fault of the site owners. If someone reported a bot pounding on them, LP could check the logs and see if it's a widespread problem, and if so, block the bot at a high level. robots.txt requests could return an auto-generated file that just disallows that bot.

Their response was "not interested, leave it to the site owner". So, you're on your own, kiddies. They did include some helpful hints which I'll copy here:

Quote
Please note that in general on the server there are several protections against bots in place. But on a shared environment, this setup has his limitations and there is no possibility to block all the ill-intended bots that there are. Thus, on a shared server, the owner takes the responsibility to block certain bots, by adding corresponding lines in .htaccess, or the owner can activate the MSH - Managed Shared Hosting service and this will be done for him, by us.

For a more general and to the point approach we recommend that you block access altogether to sensitive website files (such as xmlrpc.php files in the case of Wordpress website). This may sometimes be possible to do using a caching plugin but would best be implemented through custom .htaccess settings.

Details : First, add the following lines :
Code: [Select]
<Files xmlrpc.php>
Order Deny,Allow
Deny from all
</Files>

<Files admin-ajax.php>
Order allow,deny
Allow from all
Satisfy any
</Files>

and also: Add password protection to some of your website key files or folders, particularly login pages for the website administration area (such as the wp-login.php file in the case of Wordpress websites or the 'administrator' subfolder for Joomla websites).
Visit My Site

E-mail Me
-= From the ashes shall rise a sooty tern =-

Offline MrPhil

  • Senior Moderator
  • Berserker Poster
  • *****
  • Posts: 6425
Re: Booting nuisance bots
« Reply #6 on: January 14, 2019, 12:12:34 PM »
LP says (through my ticket) that they will not globally block any bots; that the responsibility for dealing with ill-mannered bots lies with the site owner. My last reply in the ticket:
Quote
Well, I suspect that many of your customers who are getting dinged for excessive CPU usage (and being told they must upgrade their hosting plan) may actually be suffering from attacks by ill-mannered robots. These are not necessarily malicious bots, just poorly designed or implemented ones that hit a site too hard (too frequently). If LP is not willing to block them globally, at least you should send out a reminder to customers about the problem, and how to best deal with it. Otherwise, you're losing customers who can't afford to upgrade to a dedicated server (as advised), when all they need to do is update their .htaccess and robots.txt files. A few weeks ago, I was running 40 to 50 peak concurrent visitors on my forum, and suffering almost daily bursts of database errors. I banned 4 bots, and right now I'm down to about 3 or 4 peak concurrent visitors, and no errors. Please make a reminder about cutting the bot load part of your standard response to customers with high CPU loads. A detailed wiki article would be a good start.

I hope to see a wiki article on how to determine which bot(s) are hammering you, and how to block them in robots.txt and .htaccess, and that links to this article will be given in high CPU usage tickets and as a routine notification to customers.
Visit My Site

E-mail Me
-= From the ashes shall rise a sooty tern =-