This article is a part of the series on undesired email (spam, phishing, viruses, etc.). The material covers the Poisons and the Remedies.
By Stas Bekman.
Published: May 15th 2006
When Bayesian analysis technique is used, it's the statistics that do all the work. The problem here is that a Bayesian filter requires training - so when you just start it for the first time, you need to feed it with your good mail and with your undesired email (telling it whether it's a good mail or not). From that point on a Bayesian filter will try to decide what's spam and what not and sort the email to different folders. You still need to constantly review both folders and make sure that you tell the filter if you've spotted misplaced email (i.e. sometimes it will miss a spam, and sometimes it'll put a valid email into a spam folder). Assuming that you receive emails that are quite homogeneous in nature -- in a relatively short time it'll starting making less and less mistakes. However since spammers are trying to outsmart statistics, they come up with gibberish content emails which often times cause a miss and you get a spam in your INBOX.
The main disadvantage here is that it requires constant training. Even though after a certain point it'll catch most of the spam and have almost no false-positive. This approach works the best if each user has their own filter, since different users receive different emails - as they say: someone's spam is someone else's ham.
In this approach it's the user that wastes their time on training the filter, therefore if your organisation has lots of users than you may be wasting a lot of time across the board. However this solution usually doesn't cost anything to the company, since the real knowledge base is provided by users.
In my humble opinion this technique could be very useful if each user trains its own bayesian filter. However this doesn't scale as well as other techniques, that remove most of the spam at the gateway. i.e. if you have a big organization, each users spending a few minutes feeding the bayesian filter adds up to a lot of time across the organisation.
Here are some vendors supporting this technique (including open-source solutions):
Kaspersky
Internet Security (http://www.kaspersky.com) CRM114 (http://crm114.sourceforge.net/) Death2Spam (http://death2spam.net/) POPFile (http://popfile.sourceforge.net/) SpamAssassin (http://spamassassin.apache.org/) SpamBayes (http://spambayes.sourceforge.net/) SpamProbe (http://spamprobe.sourceforge.net/) SpamSweep (http://www.bainsware.com/spamsweep/) trimMail Inbox (http://www.trimmail.com/) |
|
Please notify me if you know of others.
And here are some pointers for additional information on the subject:
| A plan
for Spam (http://www.paulgraham.com/spam.html) The Grumpy
Editor's Guide to Bayesian Spam Filters (http://lwn.net/Articles/172491/) Spam
Filters (http://freshmeat.net/articles/view/964/) Blocking
over 98% of Spam Using Bayesian filtering
technology (http://www.windowsecurity.com/whitepaper/anti_spam/Blocking_Spam_Bayesian_Filtering.html) |
And here you can find books that will provide an indepth coverage of Bayesian content filtering:
Continue reading about other Remedies or jump to the email-related Poisons section.