Lunarpages Web Hosting Forum

Author Topic: How-to: Train SpamAssassin  (Read 22925 times)

Danielle

  • Guest
How-to: Train SpamAssassin
« on: April 08, 2004, 01:29:27 PM »
Many Thanks to w98 (i.e., id) for doing this how-to on SA training which can be found at the following location:

http://www.lunarforums.com/forum/index.php?topic=13958.0

Please note the posts that follow in this thread involved an older copy of the how-to, so please instead post all messages in the above thread after reviewing the how-to there.

Thanks  :D
« Last Edit: August 18, 2005, 07:42:42 AM by Danielle »

Offline w98

  • Galactic Royalty
  • *****
  • Posts: 443
    • http://iandouglas.com
How-to: Train SpamAssassin
« Reply #1 on: April 09, 2004, 02:13:36 PM »
Hi Danielle,

Any chance you could simply provide a link to the other thread? Lopht and I have made some changes to the documentation and the script itself, so that would be a better place to send the users, perhaps?

Thanks,
Ian Douglas, aka "id", aka "w98"  :thumb:

Offline Rocknrob

  • Spacescooter Operator
  • *****
  • Posts: 42
    • http://www.hottestwebdesigns.com
How-to: Train SpamAssassin
« Reply #2 on: April 09, 2004, 07:10:58 PM »
Spaceship captain, help!
On step 3. Where do I exactly put this inside of the folder?
required_hits 5
rewrite_subject 1
subject_tag {SPAM}
bayes_path /home/ lpaccount /.spamassassin/bayes
bayes_file_mode 0600
bayes_ignore_header X-MailScanner
bayes_ignore_header X-MailScanner-SpamCheck
bayes_ignore_header X-MailScanner-SpamScore
bayes_ignore_header X-MailScanner-Information
I am nervous at this piont, should it have # in front of them or no?
Thanks,

Offline w98

  • Galactic Royalty
  • *****
  • Posts: 443
    • http://iandouglas.com
How-to: Train SpamAssassin
« Reply #3 on: April 09, 2004, 08:40:13 PM »
Code: [Select]
required_hits 5
rewrite_subject 1
subject_tag {SPAM}
bayes_path /home/lpaccount/.spamassassin/bayes
bayes_file_mode 0600
bayes_ignore_header X-MailScanner
bayes_ignore_header X-MailScanner-SpamCheck
bayes_ignore_header X-MailScanner-SpamScore
bayes_ignore_header X-MailScanner-Information


all of that goes in your /home/lpaccount/.spamassassin/user_prefs file, assuming your LP account name is "lpaccount", of course.

The lines that being with a # are just commented lines, you don't need those.

Essentially, here's a line-by-line description of what each portion of that example user_prefs file does:

required_hits 5
This tells SA that anything that scores higher than 5.0 points should be flagged as SPAM

rewrite_subject 1
This tells SA to rewrite the start of your subject line if it scores higher than what the score was from the previous setting.

subject_tag {SPAM}
This tells SA what to prepend the string "{SPAM}" to the start of any Emails that score higher than "required_hits".

bayes_path /home/lpaccount/.spamassassin/bayes
This is the start of the path for your bayesian database including the start of the filename. If this path ended in "/ianwashere" then it would look for files like "ianwashere_toks" and "ianwashere_seen" etc.

bayes_file_mode 0600
How to set permissions on your bayesian database files

bayes_ignore_header X-MailScanner
bayes_ignore_header X-MailScanner-SpamCheck
bayes_ignore_header X-MailScanner-SpamScore
bayes_ignore_header X-MailScanner-Information

These lines basically tell SpamAssassin to skip scanning these header strings for content that could be spammy. This is generally a good idea to have here.

-id

Offline Rocknrob

  • Spacescooter Operator
  • *****
  • Posts: 42
    • http://www.hottestwebdesigns.com
How-to: Train SpamAssassin
« Reply #4 on: April 09, 2004, 08:43:49 PM »
That didn't really help, I need to know where exactly within the code, do i put it.
Code: [Select]
# SpamAssassin user preferences file.  See 'perldoc Mail::SpamAssassin::Conf'
# for details of what can be tweaked.
###########################################################################

# How many hits before a mail is considered spam.
#required_hits 5
# Whitelist and blacklist addresses are now file-glob-style patterns, so
# "friend@somewhere.com", "*@isp.com", or "*.domain.net" will all work.
# whitelist_from someone@somewhere.com

# Add your own customised scores for some tests below.  The default scores are
# read from the installed spamassassin rules files, but you can override them
# here.  To see the list of tests and their default scores, go to
# http://spamassassin.org/tests.html .
#
# score SYMBOLIC_TEST_NAME n.nn

# Speakers of Asian languages, like Chinese, Japanese and Korean, will almost
# definitely want to uncomment the following lines.  They will switch off some
# rules that detect 8-bit characters, which commonly trigger on mails using CJK
# character sets, or that assume a western-style charset is in use.
#
# score HEADER_8BITS 0
# score HTML_COMMENT_8BITS 0
# score SUBJ_FULL_OF_8BITS 0
# score UPPERCASE_25_50 0
# score UPPERCASE_50_75 0
# score UPPERCASE_75_100 0



Where inside of there?

Offline w98

  • Galactic Royalty
  • *****
  • Posts: 443
    • http://iandouglas.com
How-to: Train SpamAssassin
« Reply #5 on: April 09, 2004, 08:54:55 PM »
At the very end of the file, or just simply erase everything in there and replace it with my example since my example is pretty "factory default" anyways other than the path to your bayesian database files.

leighsww

  • Guest
How-to: Train SpamAssassin
« Reply #6 on: April 17, 2004, 03:38:57 PM »
FANTASTIC, w98!!  :yey:

I can't believe you documented all that!!  Must have took you many sleepless nights!

Excellent and THANKS for this amazing contribution!  :thumb:

Offline Tracie

  • MR-Disabled
  • Master Jedi
  • *
  • Posts: 1429
How-to: Train SpamAssassin
« Reply #7 on: April 17, 2004, 04:01:25 PM »
Excellent information!

Thanks w98!

Offline w98

  • Galactic Royalty
  • *****
  • Posts: 443
    • http://iandouglas.com
How-to: Train SpamAssassin
« Reply #8 on: April 17, 2004, 08:30:03 PM »
Actually, it only took about a day or two to write up, and tested the documentation on my second account.

Next step will be to add documentation and screen shots of copying messages to/from Outlook.

Glad it's helping some of you out there. 8-)

Offline Mart

  • Pong! (the videogame) Master
  • *****
  • Posts: 25
How-to: Train SpamAssassin
« Reply #9 on: April 18, 2004, 07:04:33 AM »
Just like to add my thanks to the list, I have it all set-up and seems to be working nicely.  Now to see how SA responds to my training :).

Offline w98

  • Galactic Royalty
  • *****
  • Posts: 443
    • http://iandouglas.com
How-to: Train SpamAssassin
« Reply #10 on: April 18, 2004, 11:18:14 PM »
I've had it running since a day or so before posting my ideas, and almost everything ending up in my spam mailboxes is getting there with a BAYES_99 header, meaning that SpamAssassin is 99%-100% confident the message is spam.

Of course, I get a LOT of spam at my domains, and having a catch-all address set up means I catch all that much more since some spammers have misspelled my Email address (ie: used to be "info@wild98.com" and got misspelled to "o@wild98.com", or used to be "ian@wild98.com" and is now "an@wild98.com") so I'm catching alllll kinds of spam.

Glad it's helping though, and glad everyone's getting use out of it. I have a few more things I'll be adding to the topic this week.

Offline Tatami

  • Newbie
  • *
  • Posts: 2
How-to: Train SpamAssassin
« Reply #11 on: April 26, 2004, 11:42:39 PM »
Hello,

Sorry, I may have missed a step: when calling http://www.mydomain.com/cgi-bin/sa-learn.cgi, I get a server error msg (500).

Also, filtering doen't appear to work when I spam myself...

Execute permission are Ok for the file, and paths should be OK. SA spam/spambox options enabled, and top level myspam/myahm boxes created...

==========================
!/usr/bin/perl

my $salearn = "/usr/bin/sa-learn" ;
$| ;

print "Content-type: text/plain\n\n" ;

print "Learning SPAM:\n" ;
print `$salearn -p /home/lpaccount/.spamassassin/user_prefs --mbox --spam --showdots /home/lpaccount/mail/myspam` ;
print "\n\n" ;

[ etc...]
==========================

Thanks for yor input...

Offline w98

  • Galactic Royalty
  • *****
  • Posts: 443
    • http://iandouglas.com
How-to: Train SpamAssassin
« Reply #12 on: April 27, 2004, 07:12:24 AM »
The first line must be
Code: [Select]
#!/usr/bin/perl
Looks like you were missing the # at the start of the first line.

Offline kwdavids

  • Galactic Royalty
  • *****
  • Posts: 324
    • Netsmart Technologies
How-to: Train SpamAssassin
« Reply #13 on: June 26, 2004, 01:17:41 PM »
I set up the Bayesian options in Spam Assassin and I fed it over 1000 spam messages, plus a hundred good emails. The training script seemed to work ok -- it counts the messages it processes.

However, none of the emails has a BAYES_nn in the X-Spam-Status header.

Here's my user_prefs:

Code: [Select]
# SpamAssassin config file

# How many hits before a message is considered spam.
required_hits 5.8

# Whether to change the subject of suspected spam
rewrite_subject  0

# Encapsulate spam in an attachment
report_safe             0

# Use terse version of the spam report
use_terse_report     0

# Enable the Bayes system
use_bayes               1

# Enable Bayes auto-learning
auto_learn              1

# Other Bayes stuff
bayes_path /home/[deleted actual value]/.spamassassin/bayes
bayes_file_mode 0600
bayes_ignore_header X-MailScanner
bayes_ignore_header X-MailScanner-SpamCheck
bayes_ignore_header X-MailScanner-SpamScore
bayes_ignore_header X-MailScanner-Information
Kevin

Offline voman

  • Trekkie
  • **
  • Posts: 17
How-to: Train SpamAssassin
« Reply #14 on: March 05, 2005, 09:39:14 AM »
Quote from: w98
The first line must be
Code: [Select]
#!/usr/bin/perl
Looks like you were missing the # at the start of the first line.


I am getting the same Internal Server Error message. I've followed the directions exactly. Here is my sa-learn.cgi file contents:

Code: [Select]
#!/usr/bin/perl

my $salearn = "/usr/bin/sa-learn" ;
$| ;

print "Content-type: text/plain\n\n" ;

print "Learning SPAM:\n" ;
print `$salearn -p /home/voman02/.spamassassin/user_prefs --mbox --spam --showdots /home/voman02/mail/myspam` ;
print "\n\n" ;

print "Learning HAM:\n" ;
print `$salearn -p /home/voman02/.spamassassin/user_prefs --mbox --ham --showdots /home/voman02/mail/myham` ;
print "\n\n" ;

exit ;