By Stas Bekman.
Published: May 15th 2006
Spammers collect the email addresses mainly by crawling the Internet and picking up mailing addresses from articles, forums, mailing list archives, etc. Some people try to obfuscate their addresses when they put it online, but most attempts (like writing my email address as stas at stason dot org) are still circumvented by those crawlers, since most people use the same obfuscation technique. Probably the best way to really make it hard for your address to be harvested is to put it in an image file - but really any of those techniques just make it harder for the legitimate users to figure out how to contact you. And unless you don't ever email anybody there is no really a way to keep your address from the praying eyes of the spammers. At certain point one of your best friends will leak your address out, by sending the latest kewl image or a chain letter to another 50 people using the CC field - all those addresses will be then forwarded by some of those 50 people to another 5000 people in total and so one - one of these people will be a spammer who will add your address to their database and sell it to others. Usually it's a matter of a few days before your new address ends up in the spammers' database if you do a bit of emailing here and there.
The more recent trend for harvesting email addresses is the DHA (Directory Harvest Attack). Have you ever tried to open a new account of one of those free email domains just to discover that the chosen username is already taken? You try again and it's taken too, and soon until you come up with some really obscure username. That shows that many people use common names or names of common objects as their usernames. So spammers figured if they take a dictionary of common words and try to use those (and various combinations) as usernames -- they may discover quite a few legitimate usernames by just querying the mail server. The SMTP protocol has a special command VRFY which was designed to help the senders whether the username exists before attempting to send an email (just to be polite and not waste the receiving side's resources in case the user is no longer there). However spammers found it to be a perfect tool for the directory harvesting attack. Therefore nowadays most MTAs have this command disabled. However spammers didn't stop there, nowadays they just try to deliver a spam message to dozens of usernames at a time and use the server's response to figure out whether the guess was valid (normally an MTA will report any invalid user attempt back to the sender). If you look at the logs of your MTA, you can see that every so often you get burst of invalid recipients, all starting with the same prefix, e.g.:
firstname.lastname@example.org email@example.com firstname.lastname@example.org email@example.com firstname.lastname@example.org email@example.com firstname.lastname@example.org
It'll then continue through many other prefixes. Usually those attacks are mounted using a botnet, so it's more distributed and harder to detect.
One way to deal with this problem is to instruct the server not to report invalid recipients to the sender, thus preventing the spammers from knowing which email addresses are valid and which are not. The problem is that this is going to badly hurt legitimate users, since now they won't get a report that their legitimate message wasn't delivered.
Here at mailchannels.com we have designed a special feature to prevent DHAs for TrafficControl -- our customers can configure how many invalid recipients can be tolerated during a certain length of time (e.g. only 10 invalid recipients in one hour), so those attacks are quickly stopped, without affecting legitimate users that occasionally deliver to no longer existing users. Certainly a highly distributed attack mounted by a botnet, is harder to detect, but as spammers get smarter so do we.
Read about the Remedies to learn how to deal with the problem.
And here are some pointers for additional information on the subject:
Directory Harvest Attacks (http://www.pcmag.com/article2/0,1759,1543581,00.asp)
do spammers harvest email addresses? (http://www.private.org.il/harvest.html)
Munging FAQ (http://members.aol.com/emailfaq/mungfaq.html)
Spambots: A Spambot Trap (http://www.neilgunton.com/spambot_trap/)