Anti-spam: Protect your Web site against e-mail address harvesters

Web sites that offer contact e-mail addresses for site users also inadvertently expose the company employees to spam. Here's one way of taking care of the problem using Domino.

This Content Component encountered an error
Web sites that offer contact e-mail addresses for the site's users also inadvertently expose the company employees to unwanted e-mail, or spam. Here's one way of taking care of the problem using Domino.

Spam is a growing problem internationally. We receive e-mails we never asked for from people we've never heard of. Typically these e-mails contain offers for services or products that we don't want: business proposals, golden investment opportunities, medicine or hot dates.

The senders are well-organized companies that send out enormous quantities of e-mails. Usually the return addresses are non-existent. And even if they do exist, sending replies requesting to be deleted from the address list is just like asking for more.

Spam e-mails often contain links for unsubscribing. Clicking such a link tells the spammer that your address is active and puts you in a category in the spammer's database containing the most interesting future recipients of more spam, and, of course, lists of e-mail addresses exchanged between these companies.

There is no question that the best way to avoid spam is to never get into the spam companies' lists of addressees.

Spammers use numerous methods for collecting e-mail addresses. They buy them from e-businesses that need an extra buck, they hack the databases of e-businesses that don't have proper security and they collect them off the pages of publicly available Web sites. This article focuses on protecting e-mail from publicly available Web sites.

E-mail address harvesting

Most Internet users know of Google.com (or other search engines) and understand the principles of search engine indexing. Google -- and most of the other engines -- use a spider to collect pages for indexing and to expose them as search results.

A spider reads a Web page and stores the text for indexing, and it also analyzes the page to find hyperlinks to other Web pages. The spider follows these links and starts the process again -- storing for indexing and fining new hyper links. This goes on and on.

E-mail harvesters use the same basic technology. They read a page, analyze it for links and move on. It doesn't store the content for indexing; instead, it looks for e-mail addresses in the page, looking for this text pattern: text@text.text -- the signature of an e-mail address. The e-mail harvester puts all the addresses found into a database, and the spammer is ready for business.

Web pages are text and analyzing a text for such a pattern is actually quite easy. A web page containing e-mail addresses would look similar to this:

Joyce Chutchian, Site Editor
JChutchian@techtarget.com

Joyce has been working in co
joining TechTarget.com, she
The HTML source for such a page would look like this:
<a href=mailto:"JChutchian@techtarget.com" title="Joyce
<br>Joyce has been working in computer print publishing
xperts program. She also monitors the discussion forums
That's how all Web sites are made. Text is what makes up most of the Web pages. If your Web server is Domino, Apache, MS-IIS or something else -- it is text that is delivered to the browser.

The solution

There are a number of solutions to the problem. The easiest one is to not display e-mail addresses on the Web. But that might not be the best solution when you need to service your customers.

Instead, I suggest that you implement this simple solution: You obscure the e-mail address in a way that makes it extremely difficult to identify to a piece of software.

At convergens.dk, we had contact information looking like this on site:

jbr@convergens.dk
direkte tlf.: 2486 5668
location: Convergens Vest
That would have been generated by html like this:
<a href="mailto:jbr@convergens.dk">jbr@convergens.dk</a><br>
direkte tlf: 2468 5668<br>
Today the html looks like this instead:
               <script> varAt="@"; document.write("<a
href="mail" + "to:jbr" + "" + varAt + "convergens.dk"
class="PersonInfo">jbr" + varAt + "convergens.dk</a>")
</script>
text@text.text pattern won't find the e-mail address, but the Web users looking at the page don't see the difference. Their experience of the page is the same as before. There is an address displayed, and clicking it launches the favorite e-mail client.
The Lotus Notes code that generates the HTML is a piece of computed text looking like this, where the field Internet Address contains an e-mail address formatted in the usual way -- text@text.text:
varShortName := @Left(InternetAddress; "@");
varDomainName := @Right(InternetAddress; 
"@");
varSpacedEmail := @ReplaceSubstring
(InternetAddress; "@"; " @ ");
varAt := "@";

@If(InternetAddress != "";
"<script> varAt="" + varAt + ""; document.write
("<a href="mail" + "to:" + varShortName +"" + ""
+ "" + varAt + "" + varDomainName +""
class="PersonInfo">" + varShortName + "" + varAt + "" + varDomainName + "</a>") </script>"; "")
Credits

I learned of spam harvesting and the cure from Codestore.net, a great source for Notes/Domino information.


Jens Bruntt is a SearchDomino.com expert and a senior consultant at Convergens.dk.

Do you have comments on this tip? Let us know.

This was first published in December 2003

Dig deeper on Lotus Notes Domino Antispam Software and Spam Filtering

0 comments

Oldest 

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:

-ADS BY GOOGLE

SearchWindowsServer

Search400

  • iSeries tutorials

    Search400.com's tutorials provide in-depth information on the iSeries. Our iSeries tutorials address areas you need to know about...

  • V6R1 upgrade planning checklist

    When upgrading to V6R1, make sure your software will be supported, your programs will function and the correct PTFs have been ...

  • Connecting multiple iSeries systems through DDM

    Working with databases over multiple iSeries systems can be simple when remotely connecting logical partitions with distributed ...

SearchEnterpriseLinux

SearchDataCenter

SearchExchange

SearchContentManagement

Close