Lunarpages Web Hosting Forum

Author Topic: Help!!! osCommerce + SEF URL = googlebot 404  (Read 22726 times)

Offline jwarea80

  • Newbie
  • *
  • Posts: 4
Help!!! osCommerce + SEF URL = googlebot 404
« on: August 22, 2009, 07:26:59 AM »
Hi,

This is an urgent cry for help as I've been stumped for 5 days straight looking at .htaccess, php.ini, apache, mod_rewrite, and much more info on this subject. I'm on a shared plan and am running osCommerce 2.2RC1 with Ultimate SEO URL contribution + numerous others. I have narrowed down that the problem I have arises from Ultimate SEO URL as disabling that will correct the issue. The main way that Ultimate SEO URL changes the URI is by making rewrite conditions and rules in the root's .htaccess file.

My issue:
Viewing the site on a browser looks fine but Server Header returns a 404 error for all categories and product pages.
If I go to www.mysite.com, it will return 200 OK.
If I go to www.mysite.com/index.php?cPath=25 (which is a normal osC category link), it gives a 200 OK.
If I go to www.mysite.com/cell-phone-c-25.html (which is a link to a category by Ultimate SEO URL), it gives a 404 Error.
The weird thing is that everything will load fine on the browser, but server header returns 404.
Now google is listing the bulk of my site as 404 Not Found because it got a 404 error and stopped looking any deeper.
My robots.txt returns 200 so googlebot is ok with that.

Let me know if anyone needs further info in helping me resolve this. My whole site has dropped out of google search.
Please help!!! Thank you.

Offline wektech

  • Master Jedi
  • *****
  • Posts: 1038
    • Yuma Arizona Information
Re: Help!!! osCommerce + SEF URL = googlebot 404
« Reply #1 on: August 22, 2009, 07:45:09 AM »
Can you post your .htaccess file. that may give a clue.

Offline jwarea80

  • Newbie
  • *
  • Posts: 4
Re: Help!!! osCommerce + SEF URL = googlebot 404
« Reply #2 on: August 22, 2009, 07:51:45 AM »
Below is the content of the .htaccess file...

suPHP_ConfigPath /home/china276/public_html

# $Id: .htaccess 1739 2007-12-20 00:52:16Z hpdl $
#
# This is used with Apache WebServers
#
# For this to work, you must include the parameter 'Options' to
# the AllowOverride configuration
#
# Example:
#
# <Directory "/usr/local/apache/htdocs">
# AllowOverride Options
# </Directory>
#
# 'All' with also work. (This configuration is in the
# apache/conf/httpd.conf file)

# The following makes adjustments to the SSL protocol for Internet
# Explorer browsers

#<IfModule mod_setenvif.c>
#  <IfDefine SSL>
#    SetEnvIf User-Agent ".*MSIE.*" \
#             nokeepalive ssl-unclean-shutdown \
#             downgrade-1.0 force-response-1.0
#  </IfDefine>
#</IfModule>

# If Search Engine Friendly URLs do not work, try enabling the
# following Apache configuration parameter

# AcceptPathInfo On

# Fix certain PHP values
# (commented out by default to prevent errors occuring on certain
# servers)

# php_value session.use_trans_sid 0
# php_value register_globals 1

# BEGIN 404 Redirect
ErrorDocument 404 /404.php
# END 404 Redirect

# Ultimate SEO URLs BEGIN
Options +FollowSymLinks
Options All -Indexes
RewriteEngine On
RewriteBase /
RewriteCond %{QUERY_STRING} ^options\=(.*)$
RewriteRule ^(.*)-p-(.*).html$ product_info.php?products_id=$2%1
RewriteRule ^(.*)-p-(.*).html$ product_info.php?products_id=$2&%{QUERY_STRING}
RewriteRule ^(.*)-c-(.*).html$ index.php?cPath=$2&%{QUERY_STRING}
RewriteRule ^(.*)-m-(.*).html$ index.php?manufacturers_id=$2&%{QUERY_STRING}
RewriteRule ^(.*)-pi-(.*).html$ popup_image.php?pID=$2&%{QUERY_STRING}
RewriteRule ^(.*)-t-(.*).html$ articles.php?tPath=$2&%{QUERY_STRING}
RewriteRule ^(.*)-au-(.*).html$ articles.php?authors_id=$2&%{QUERY_STRING}
RewriteRule ^(.*)-a-(.*).html$ article_info.php?articles_id=$2&%{QUERY_STRING}
RewriteRule ^(.*)-pr-(.*).html$ product_reviews.php?products_id=$2&%{QUERY_STRING}
RewriteRule ^(.*)-pri-(.*).html$ product_reviews_info.php?products_id=$2&%{QUERY_STRING}
RewriteRule ^(.*)-i-(.*).html$ information.php?info_id=$2&%{QUERY_STRING}
# BOF: "Extra pages-info box w/ admin" support added by faaliyet
RewriteRule ^(.*)-pm-([0-9]+).html$ info_pages.php?pages_id=$2&%{QUERY_STRING}
# EOF: "Extra pages-info box w/ admin" support added by faaliyet
RewriteRule ^(.*)-links-(.*).html$ links.php?lPath=$2&%{QUERY_STRING}
# Added polls and newsdesk
#RewriteRule ^(.*)-po-([0-9]+).html$ pollbooth.php?pollid=$2&%{QUERY_STRING}
  RewriteRule ^(.*)-n-(.*).html$ newsdesk_info.php?newsdesk_id=$2&%{QUERY_STRING}
  RewriteRule ^(.*)-nc-(.*).html$ newsdesk_index.php?newsPath=$2&%{QUERY_STRING}
  RewriteRule ^(.*)-nri-(.*).html$ newsdesk_reviews_info.php?newsdesk_id=$2&%{QUERY_STRING}
  RewriteRule ^(.*)-nra-(.*).html$ newsdesk_reviews_article.php?newsdesk_id=$2&%{QUERY_STRING}
# BOF: Faqdesk support added by faaliyet
  RewriteRule ^(.*)-f-(.*).html$ faqdesk_info.php?faqdesk_id=$2&%{QUERY_STRING}
  RewriteRule ^(.*)-fc-(.*).html$ faqdesk_index.php?faqPath=$2&%{QUERY_STRING}
  RewriteRule ^(.*)-fri-(.*).html$ faqdesk_reviews_info.php?faqdesk_id=$2&%{QUERY_STRING}
  RewriteRule ^(.*)-fra-(.*).html$ faqdesk_reviews_article.php?faqdesk_id=$2&%{QUERY_STRING}
# EOF: Faqdesk support added by faaliyet
# Ultimate SEO URLs END

# Block Bad Bots
RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR]
RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:craftbot@yahoo.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR]
RewriteCond %{HTTP_USER_AGENT} ^Custo [OR]
RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR]
RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [OR]
RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR]
RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR]
RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetWeb! [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR]
RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR]
RewriteCond %{HTTP_USER_AGENT} ^HMView [OR]
RewriteCond %{HTTP_USER_AGENT} HTTrack [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} Indy\ Library [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^InterGET [OR]
RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [OR]
RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR]
RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [OR]
RewriteCond %{HTTP_USER_AGENT} ^larbin [OR]
RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [OR]
RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [OR]
RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR]
RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Navigator [OR]
RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [OR]
RewriteCond %{HTTP_USER_AGENT} ^pavuk [OR]
RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR]
RewriteCond %{HTTP_USER_AGENT} ^RealDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR]
RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [OR]
RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebGo\ IS [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebLeacher [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ Quester [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^Widow [OR]
RewriteCond %{HTTP_USER_AGENT} ^WWWOFFLE [OR]
RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus
RewriteRule .* - [F]

Offline MrPhil

  • Senior Moderator
  • Berserker Poster
  • *****
  • Posts: 6429
Re: Help!!! osCommerce + SEF URL = googlebot 404
« Reply #3 on: August 22, 2009, 10:25:23 AM »
Well, all of your RewriteRules in the SEO section are under one RewriteCond statement. If your URL doesn't include ?options=, none of those rewrites will ever fire. Are you ever expecting to see a URL with options=? If not, try commenting out (#) that RewriteCond. After that, I'm not sure what %{QUERY_STRING} it's expecting to find on these things. You may have to remove &%{QUERY_STRING} from each RewriteRule, if there will never be a query string. It may work OK ending with that superfluous &, so try running first.

If you can't get it fixed here, you might want to go over to forums.oscommerce.com and ask in that contribution's discussion area.
Visit My Site

E-mail Me
-= From the ashes shall rise a sooty tern =-

Offline jwarea80

  • Newbie
  • *
  • Posts: 4
Re: Help!!! osCommerce + SEF URL = googlebot 404
« Reply #4 on: August 24, 2009, 07:15:56 PM »
Hi,

I have the RewriteCond line removed now but still not working. The {QUERY_STRING} is there to convert it back to a linking format for osCommerce. I also asked on the oscommerce forum but no help there either. Lunarpages wouldn't provide me much support either saying that they don't give support for user's scripts. The problem rests on the server end I believe, as I have same setup with the SEO URL as many others on the oscommerce forum. If anyone got any clue, please please please assist me in solving this.

Offline MrPhil

  • Senior Moderator
  • Berserker Poster
  • *****
  • Posts: 6429
Re: Help!!! osCommerce + SEF URL = googlebot 404
« Reply #5 on: August 25, 2009, 07:11:42 AM »
Let's take your example, http://mysite.com/cell-phone-c-25.html. I don't believe that Apache will consider any part of it a QUERY_STRING, so the RewriteCond should fail (%{QUERY_STRING} == '', I think) and none of the RewriteRules should ever fire. That's why I suggested commenting out the RewriteCond to see if that helps.

Next, this should match the rule RewriteRule ^(.*)-c-(.*).html$ index.php?cPath=$2&%{QUERY_STRING}. What's fed to the regex parser is cell-phone-c-25.html, so I would expect $1 to be cell-phone and $2 to be 25. This should be replaced by (in the current directory, /) index.php?cPath=25&. I don't think there's any harm in that trailing '&', but as I said before, I think QUERY_STRING will be empty. Is a call to /index.php?cPath=25 enough to do something interesting? You said in your first post that it worked OK. Just for grins, you might replace "index" by "/index" in the rule and see if that makes any difference. Also change ".html" to "\.html" in the regex pattern.

Are you still getting a 404 error, or something else now? Does the 404 error page (/404.php) show what directory and file it's trying to access? You may have to temporarily add [R=301] to the RewriteRule to see what the rewritten URL is. Can you confirm that this .htaccess file is in the right place and is functioning? Try putting something nonsense in it to force an error (such as RewriteRule ^(.*)$  /this_will_fail?$1  [R=301]). At least you'll know that it's reading and processing the file. By the way, you need to clear your browser cache after changing .htaccess,

By the way, your first RewriteRule ends with %1. Does %1 have a value? The "options" value from the RewriteCond? I'm not familiar with this part of URL rewriting (the online documentation is infuriatingly vague), so it's possible it does (How I wish that .htaccess had some way to display stuff like that, for debugging!). Then you have a second RewriteRule which matches the same pattern as the first. My guess is that if the first matches, the second may not (as the URL has been rewritten by this time). What are you trying to do there?

My experience has been with Apache v1, so if this server has Apache v2, it's possible that it doesn't work quite the way I expect. Also, it's a very generic .htaccess, including php_value statements, which need to be moved to a php.ini file if you want to use them (note the use of PHP Register Global variables, which is unneeded in recent versions of osCommerce).
Visit My Site

E-mail Me
-= From the ashes shall rise a sooty tern =-

Offline jwarea80

  • Newbie
  • *
  • Posts: 4
Re: Help!!! osCommerce + SEF URL = googlebot 404
« Reply #6 on: August 26, 2009, 10:54:59 AM »
Thank you for the help Mr. Phil!!!
Everything is ok now. It was caused by using someone's contribution to integrate wordpress into osCommerce. I just took a look at the code and realize it's filled with errors. I had just removed that contribution and everything is working ok now.