INFOWORLD GRIPE LINE BY ED FOSTER Bookmark this page

 
Replying To:
BAYESIAN SPAM FILTERS (none / 0) (#14)
by Mateo on Mon Jun 28, 2004 at 01:28:56 AM PDT

I think it's possible to stop spam, and that content-based filters are the way to do it. The Achilles heel of the spammers is their message. They can circumvent any other barrier you set up. They have so far, at least. But they have to deliver their message, whatever it is. If we can write software that recognizes their messages, there is no way they can get around that. To the recipient, spam is easily recognizable. If you hired someone to read your mail and discard the spam, they would have little trouble doing it. How much do we have to do, short of AI, to automate this process?

I think we will be able to solve the problem with fairly simple algorithms. In fact, I've found that you can filter present-day spam acceptably well using nothing more than a Bayesian combination of the spam probabilities of individual words. Using a slightly tweaked (as described below) Bayesian filter, we now miss less than 5 per 1000 spams, with 0 false positives.

The statistical approach is not usually the first one people try when they write spam filters. Most hackers' first instinct is to try to write software that recognizes individual properties of spam. You look at spams and you think, the gall of these guys to try sending me mail that begins "Dear Friend" or has a subject line that's all uppercase and ends in eight exclamation points. I can filter out that stuff with about one line of code.

Read more at http://www.paulgraham.com/spam.html

Mateo
Sony is my life
http://www.sony-digital-camera-driver.us



Post Comment

You are not logged in. If you don't have a user account yet, by all means go make one! If you do have one, you can post as "yourself" by filling in your nickname and password below. Otherwise, your comment will be posted as Anonymous User.

Create Account
Nickname:
Password:

Post Comment: Post your comment below and then please answer the security question. I apologize for the inconvenience, but it does help deter spammers. -- Ed Foster

Subject:
Comment:

To post your comment, please answer the following security question:
Which of the following is a dog?
Wyoming, Yellow, Mazda, Sheepdog, Green, Silver


Allowed HTML: <A [HREF] [NAME]> <DT> <TT></TT> <OL></OL> <CITE></CITE> <CODE></CODE> <I></I> <UL></UL> <BR> <STRONG></STRONG> <BLOCKQUOTE [TYPE]></BLOCKQUOTE> <DD> <EM></EM> <P> <B></B> <LI> <DL></DL>
Menu
· create account
· faq
· search

Login
Make a new account
Username:
Password:

 HOME  NEWS  COLUMNS  BLOGS  PODCASTS  TECHNOLOGIES  TEST CENTER  EVENTS  CAREERS  IT EXEC-CONNECT   About Awards Contact Us 

Copyright © 2006, Reprints, Permissions, Licensing, IDG Network, Privacy Policy.
All Rights reserved. InfoWorld is a leading publisher of technology information and product reviews on topics including viruses,
phishing, worms, firewalls, security, servers, storage, networking, wireless, databases, and web services.

ComputerWorld :: LinuxWorld :: Network World :: CIO :: PC World :: Darwin :: CMO :: CSO
IT Careers :: JavaWorld :: Macworld :: Mac Central :: Playlist :: GamePro :: GameStar :: Gamerhelp
ITWorld Canada :: Computerwoche :: Techworld UK :: tecChannel :: IDG.se :: IDG.no :: IDG.pl

create account | faq | search