Spam is a growing problem internationally. We receive e-mails we never asked for from people we've never heard of. Typically these e-mails contain offers for services or products that we don't want: business proposals, golden investment opportunities, medicine or hot dates.
The senders are well-organized companies that send out enormous quantities of e-mails. Usually the return addresses are non-existent. And even if they do exist, sending replies requesting to be deleted from the address list is just like asking for more.
Spam e-mails often contain links for unsubscribing. Clicking such a link tells the spammer that your address is active and puts you in a category in the spammer's database containing the most interesting future recipients of more spam, and, of course, lists of e-mail addresses exchanged between these companies.
There is no question that the best way to avoid spam is to never get into the spam companies' lists of addressees.
Spammers use numerous methods for collecting e-mail addresses. They buy them from e-businesses that need an extra buck, they hack the databases of e-businesses that don't have proper security and they collect them off the pages of publicly available Web sites. This article focuses on protecting e-mail from publicly available Web sites.
E-mail address harvesting
Most Internet users know of Google.com (or other search engines) and understand the principles of search engine indexing. Google -- and most of the other engines -- use a spider to collect pages for indexing and to expose them as search results.
A spider reads a Web page and stores the text for indexing, and it also analyzes the page to find hyperlinks to other Web pages. The spider follows these links and starts the process again -- storing for indexing and fining new hyper links. This goes on and on.
E-mail harvesters use the same basic technology. They read a page, analyze it for links and move on. It doesn't store the content for indexing; instead, it looks for e-mail addresses in the page, looking for this text pattern: firstname.lastname@example.org -- the signature of an e-mail address. The e-mail harvester puts all the addresses found into a database, and the spammer is ready for business.
Web pages are text and analyzing a text for such a pattern is actually quite easy. A web page containing e-mail addresses would look similar to this:
Joyce Chutchian, Site Editor JChutchian@techtarget.com Joyce has been working in co joining TechTarget.com, sheThe HTML source for such a page would look like this:
<a href=mailto:"JChutchian@techtarget.com" title="Joyce <br>Joyce has been working in computer print publishing xperts program. She also monitors the discussion forumsThat's how all Web sites are made. Text is what makes up most of the Web pages. If your Web server is Domino, Apache, MS-IIS or something else -- it is text that is delivered to the browser.
There are a number of solutions to the problem. The easiest one is to not display e-mail addresses on the Web. But that might not be the best solution when you need to service your customers.
Instead, I suggest that you implement this simple solution: You obscure the e-mail address in a way that makes it extremely difficult to identify to a piece of software.
At convergens.dk, we had contact information looking like this on site:
email@example.com direkte tlf.: 2486 5668 location: Convergens VestThat would have been generated by html like this:
<a href="mailto:firstname.lastname@example.org">email@example.com</a><br> direkte tlf: 2468 5668<br>Today the html looks like this instead:
<script> varAt="@"; document.write("<a href="mail" + "to:jbr" + "" + varAt + "convergens.dk" class="PersonInfo">jbr" + varAt + "convergens.dk</a>") </script>firstname.lastname@example.org pattern won't find the e-mail address, but the Web users looking at the page don't see the difference. Their experience of the page is the same as before. There is an address displayed, and clicking it launches the favorite e-mail client.
The Lotus Notes code that generates the HTML is a piece of computed text looking like this, where the field Internet Address contains an e-mail address formatted in the usual way -- email@example.com:
varShortName := @Left(InternetAddress; "@"); varDomainName := @Right(InternetAddress; "@"); varSpacedEmail := @ReplaceSubstring (InternetAddress; "@"; " @ "); varAt := "@"; @If(InternetAddress != ""; "<script> varAt="" + varAt + ""; document.write ("<a href="mail" + "to:" + varShortName +"" + ""Credits
+ "" + varAt + "" + varDomainName +""
class="PersonInfo">" + varShortName + "" + varAt + "" + varDomainName + "</a>") </script>"; "")
I learned of spam harvesting and the cure from Codestore.net, a great source for Notes/Domino information.
Jens Bruntt is a SearchDomino.com expert and a senior consultant at Convergens.dk.
Do you have comments on this tip? Let us know.
This was first published in December 2003