Lunarpages Web Hosting Forum

Author Topic: Throttling  (Read 196 times)

Offline EnglishMajor

  • Trekkie
  • **
  • Posts: 17
Throttling
« on: February 09, 2019, 02:28:37 PM »
A couple years ago, hackers managed to inject some php scripts into my site, sending thousands of spam messages from my account. Lunarpages customer support was nothing short of appalling during that crisis, which I eventually was able to resolve on my own after countless hours of effort.

Now, I have another problem that Lunarpages customer support has failed to help with (*see exception below), and as a last resort before switching webhosts, I am reaching out to the community to see if there is a solution.

Since December, my account has exceeded peak memory usage limits and has been throttled. Users are getting intermittent 500 errors due to the server failing to serve the pages requested. In the logs "cannot allocate memory" errors abound. My pages haven't changed in any meaningful way, but this problem simply didn't occur before December. My genuine users are minimal. A handful of people read my WordPress blog. A few more browse my OpenCart installation. That's about it.

Indeed, reviewing the logs, the majority of activity is coming from web crawlers. But under normal circumstances, those should not be causing this problem. They never did before.

I do see constant hacker attacks trying to gain access to my administrative pages in the visitors log. I have denied access to literally hundreds of ips, but the attacks just keep coming with new ips.

In fact, someone was somehow able to delete whole directories from my sites, although I am the only one with access. Lunarpages says they have no record of this activity. It took me weeks to restore and update the installations.

Lunarpages moved my account from eros to rigel, which had absolutely no effect. They say they have increased my php memory, but that has also produced no effect.

Which leads to my question. If I could find out which process(es) are using so much memory, perhaps I could shut these hackers down. Does anyone know if there is a way to get that information? Lunarpages does not seem to want to answer my questions, instead pointing me to the CPanel statistics that they themselves have admitted are worthless in these forums.

(*I will single out Laurențiu Victor Vișan for providing a very detailed response when other customer service representatives did not. Although his suggestions did not resolve all issues, they resolved some. Unfortunately, there is no continuity, and when you respond to one representative's, questions, you are likely to get a different one who is typically not willing to review the thread of messages and will only respond with useless, generic information.)

Offline MrPhil

  • Senior Moderator
  • Berserker Poster
  • *****
  • Posts: 6398
Re: Throttling
« Reply #1 on: February 09, 2019, 04:54:35 PM »
Maybe my experience can help here. In the last few months of 2018, I started getting several ill-mannered bots really pounding on me. Not an apparent DoS attack, just doing things like requesting my forum statistics pages every second or two. I would get bursts of database errors on my forum once or twice a day. "Support" was of no real help, telling me it was up to me to figure out what was happening and deal with it.

By looking at the raw access logs and correlating them to the error_log error listings (be careful of what time zone each is using), I was able to see certain bots that needed disciplining. I denied them in robots.txt, and used .htaccess to deny access to all but robots.txt to certain HTTP_REFERERs and USER_AGENTs. This cut down my concurrent user access on the forum from a high of 69 to 3 or 4, and almost eliminated the bursts of errors, by cutting out only four bots. Fingers crossed, so far it's working.

I asked LP to block these bots at a higher level, as no doubt they were causing high CPU usage issues for a lot of users, but they refused to consider it. They did say that they would consider making a Wiki entry and point customers to it.
Visit My Site

E-mail Me
-= From the ashes shall rise a sooty tern =-

Offline EnglishMajor

  • Trekkie
  • **
  • Posts: 17
Re: Throttling
« Reply #2 on: February 09, 2019, 09:30:06 PM »
I was able to see certain bots that needed disciplining.

Thank you for replying, MrPhil. I have been adding ips to my .htaccess deny list at a breakneck pace, but no improvement. May I ask what criteria you use to distinguish the bad bots from the good ones?

If you or anyone else happens to know, is denying them in robots.txt or .htaccess more effective? I have a huge number of "denied by server configuration" errors as a result of my method. Are those wasting cpu resources?

Offline MrPhil

  • Senior Moderator
  • Berserker Poster
  • *****
  • Posts: 6398
Re: Throttling
« Reply #3 on: February 10, 2019, 07:12:10 AM »
My criteria for a "bad bot" is that they were very active around the time that a forum database error burst occurred. Block them, and if the errors seem to be reduced, that was a good one to block.

I've found that it takes both robots.txt and .htaccess blocks to work. .htaccess blocks everything but /robots.txt, and robots.txt tells the reader that they're disallowed. If they bother to read and obey robots.txt, that alone should work. For some, it should be sufficient to disallow in robots.txt, but many bots ignore robots.txt and need an .htaccess deny. An honest bot that first reads robots.txt should obey it and stop there, but those that don't read or obey robots.txt need sterner medicine. It's hard to tell a malicious bot trying to deny service to you, from a poorly configured bot that simply is hitting you too often for your server to keep up. You can try just using a robots.txt block and see if that does the job, before going to .htaccess.

'denied by server configuration' is not a waste of CPU. It means that the bot (IP address) was stopped at your front door and didn't get inside to run amok and break all your furniture (waste CPU).
Visit My Site

E-mail Me
-= From the ashes shall rise a sooty tern =-

Offline EnglishMajor

  • Trekkie
  • **
  • Posts: 17
Re: Throttling
« Reply #4 on: February 10, 2019, 08:27:59 AM »
MrPhil, I don't see any database errors, unless I am looking in the wrong place. I assume from your response that you are not looking in the cPanel Errors log, or you would not have to guess which bot is causing the problem.

I would imagine bad bots ignore robots.txt as a rule, so we are only using that to block bots that use excessive resources. I should be able to see in the Visitors log how often a bot is crawling my pages. Do you have a sense for what is a reasonable amount of time a well-configured bot should wait between scans of the site? My pages change maybe once a month.

I am aware of the existence of the "crawl-delay" directive, but I wonder if it is effective. According to Google's documentation, to cite a significant example, "The non-standard "crawl-delay" robots.txt directive is not processed by Googlebot."

Offline MrPhil

  • Senior Moderator
  • Berserker Poster
  • *****
  • Posts: 6398
Re: Throttling
« Reply #5 on: February 10, 2019, 09:24:47 AM »
I do a daily cron scan for "error_log" files produced by PHP errors (found in multiple directories, depending on which .php file reported it). I also check the cPanel "Errors" listing, which is things like 404 and 500 errors. Database errors (too many accesses in too short of time) put out an on-screen error message, which in turn trips a PHP error because headers have already been sent (due to the error message sent to the browser too early). The database overload thus indirectly causes a PHP error, which is good enough for my purposes, although I do have to do a bit of looking around to find the bot at fault.

If LP would have a more powerful database engine that doesn't overload so easily, I wouldn't have a problem with ill-mannered bots overloading my site. I suspect that many site owners who have been told they need to upgrade their hosting plans due to high CPU usage are actually suffering from bot "attacks", but LP is not interested in dealing with this.

Actually malicious bots probably ignore robots.txt as a rule, but it does't hurt to try that route first, going to .htaccess blocking as your second line of defense. Regarding "how much is too much?", if a bot is found at the scene of the crime with a  smoking gun (was the only one active when the DB error burst occurred), I consider them guilty. Usually it's one bot at a time hitting my forum. If it were two or three at once (interleaved accesses), it would be a judgment call on which one(s) to ban. I don't ban Google, as they tend to be well-behaved, but anyone else is fair game. I can always unban them later, if I want to see if they're still ill-behaved.

'crawl-delay' would be like any other robots.txt directive -- obeyed or not at the whim of the bot's creator. In other words, it might help, but don't count on it.
Visit My Site

E-mail Me
-= From the ashes shall rise a sooty tern =-

Offline EnglishMajor

  • Trekkie
  • **
  • Posts: 17
Re: Throttling
« Reply #6 on: February 10, 2019, 12:54:39 PM »
Thank you for engaging with me on this topic, MrPhil. As I am just an English Major with above-average IT skills and not an IT professional, it's getting a bit too technical for me. But if I understand you correctly, the error_logs you mention would be an artifact of bots crawling (running?) too quickly?

With that in mind, what do you think of this more generalized approach: (1) Direct all bots to wait a certain number of days between crawls. (2) Assume any that disregard the directive (excepting Googlebot) are poorly configured or malicious, and (3) ban those in robots.txt.

It occurs to me that some bots may be willing to observe the robots.txt file but cannot access it if banned in .htaccess.

I also noted your response to another user regarding LP's allowing hackers to access files directly and bypass the .htaccess file altogether simply by specifying a direct path. I see this happening in my logs, where "file not found" errors are reported on ips already banned in the root .htaccess. It appears that even fake paths would not spare the resources wasted by such attacks.

It is perplexing to me that LP takes a "blame the victim" approach and doesn't provide adequate support in this area, considering most if not all of their users will be affected at some point. Do they have so many new customers that they don't have to worry about losing the old ones?

Offline MrPhil

  • Senior Moderator
  • Berserker Poster
  • *****
  • Posts: 6398
Re: Throttling
« Reply #7 on: February 10, 2019, 01:23:39 PM »
As I am just an English Major with above-average IT skills and not an IT professional, it's getting a bit too technical for me.
And I'm just an IT major with above-average English skills... so we're even! :)

Quote
But if I understand you correctly, the error_logs you mention would be an artifact of bots crawling (running?) too quickly?
They are hitting the database in my forum too frequently (by requesting pages too often), causing errors in the DB, which outputs a message to the browser (bot, in this case), which causes PHP to complain that headers have already been sent. So, I happen to see the bots' activities as causing error_log entries (files). You might see their effects in some different way, such as LP telling you that your CPU usage is too high. If you run an ecommerce site, it might be manifested in a different way than a forum or blog does.

Quote
With that in mind, what do you think of this more generalized approach: (1) Direct all bots to wait a certain number of days between crawls. (2) Assume any that disregard the directive (excepting Googlebot) are poorly configured or malicious, and (3) ban those in robots.txt.
That might work. There's no harm in giving it a try. Note that a number of bots I've banned are "business intelligence" services, providing data on sites to sell for competitive analysis, rather than search engine indexing (like Google). You might want to check what the business is of a given bot's home site, before banning it ("Disallow") in robots.txt. You probably don't want to block search engine bots unless they're simply causing you too much pain. Business Intelligence / Competitive Analysis bots aren't doing you much good, so no harm in hamstringing them.

Quote
It occurs to me that some bots may be willing to observe the robots.txt file but cannot access it if banned in .htaccess.
Absolutely. If you feel you need to ban (block) in .htaccess, you still make an exception for /robots.txt, so the bot can read it (if it cares to) and see you don't like it and don't want to be Best Friends Forever.

Quote
I also noted your response to another user regarding LP's allowing hackers to access files directly and bypass the .htaccess file altogether simply by specifying a direct path. I see this happening in my logs, where "file not found" errors are reported on ips already banned in the root .htaccess. It appears that even fake paths would not spare the resources wasted by such attacks.
My understanding of .htaccess processing is that the server is supposed to start by always processing the root (/) .htaccess, and then the next one down the chain of directories, etc. until the last one in the target directory (where the .php file is running). However, I have seen behavior on my server (/.htaccess apparently being skipped) that suggests that for some reason the server is configured to jump directly to the target directory's .htaccess, skipping anything earlier. I can't find any documentation on whether this is permitted or encouraged, but it certainly breaks a lot of sites. If someone is giving a real path (but fake or missing file), that could well explain why your /.htaccess IP blocks are being bypassed.

Quote
It is perplexing to me that LP takes a "blame the victim" approach and doesn't provide adequate support in this area, considering most if not all of their users will be affected at some point. Do they have so many new customers that they don't have to worry about losing the old ones?
Same here. All I can figure is that most of their support staff is in India now, just reading off of canned scripts and not having the ability to adequately support their users. They certainly don't have enough new customers (who require much more hand-holding) to let them drive away their old ones!
Visit My Site

E-mail Me
-= From the ashes shall rise a sooty tern =-

Offline EnglishMajor

  • Trekkie
  • **
  • Posts: 17
Re: Throttling
« Reply #8 on: February 10, 2019, 06:58:48 PM »
I am happy to ban bots that want something of value from me for free and welcome bots that want to give me something of value for free. ;-)

After further research, I think I'll first try blocking a list of known bad bots and a list of known bad ips before trying to identify them one by one.

BTW, does the "Notify me of replies" checkbox have any particular function on your end? It doesn't on mine.

Offline MrPhil

  • Senior Moderator
  • Berserker Poster
  • *****
  • Posts: 6398
Re: Throttling
« Reply #9 on: February 11, 2019, 06:02:55 AM »
Just remember that new bots are appearing all the time, and such lists quickly become obsolete. Other than that, go for it.

"Notify me of replies", when checked, is supposed to send you an email (to your registered email address) whenever someone adds to the thread (topic). I don't use it myself, so I don't know if it's working or not.
Visit My Site

E-mail Me
-= From the ashes shall rise a sooty tern =-

Offline EnglishMajor

  • Trekkie
  • **
  • Posts: 17
Re: Throttling
« Reply #10 on: February 11, 2019, 01:16:09 PM »
Indeed, the fact that new bots are appearing all the time makes one wish for a more robust solution. Since this affects all LP users, why doesn't LP offer some basic protections? Surely they have some expertise in this area.

Turn out my e-mail address in the forums was wrong. So, can't blame LP for not receiving messages from here.

Offline HaroldObjed

  • HaroldObjedFT
  • Newbie
  • *
  • Posts: 1
pharmacy drug store pr
« Reply #11 on: Today at 07:45:45 AM »
buy prescription drugs online
northwest pharmacy
cheap drugs online
<a href="http://canadianpharmacymsn.com/">online pharmacy</a>