Manage Learn to apply best practices and optimize your operations.

Forcing A Server To Connect To The Original Server In A Cluster After Failure

After a delivery failover between two mail servers in a cluster has occurred,
if the primary server is re-started, mail will still be routed to the backup
server until an hour after the original failure. From there, it will again try
to connect to the primary server. If it is still not available, mail will still
be delivered to the backup server for another hour. The hour period is the
default for when cost of routing is reset in the Notes routing tables (memory
resident). This time period could be less depending on where in the cycle the
primary server is re-started. For example, if server cost of routing table on a
server that should be connecting to the primary server was refreshed thirty
(30) minutes ago, you should only have to wait another thirty (30) minutes for
correct mail routing to be re-established if the primary server is back up and
running.
A Notes/Domino customer has set up a cluster between two servers in order to
have a redundant mail routing system. The customer achieved this by having two
servers in one cluster, with one of them being the primary mail server and the
other being a backup server. This means that all users' Person documents point
to mail files on the primary server while the backup server has a replica of
all mail files on the primary server.

In order for inbound mail routing to work in the event of the primary server
being unavailable, the customer has added the following NOTES.INI parameter to
all servers in the organization that can connect to either the primary or
backup servers:


MailClusterFailover=1

This has the affect of forcing a server to try to deliver mail to the primary
server and deliver it to the backup server if the primary is unavailable. The
users will obviously have switched servers in line with their CLUSTER.NCF file
when the primary server is unavailable.

The delivery failover to the backup server does not happen until a number of
connection failures have occurred on any server (that has the
MailClusterFailover=1 option set in its NOTES.INI) trying to connect to the
primary server. This is expected as the cost of routing has to increase for
failover to occur. These failover times will also vary depending on whether the
connecting server is in the same Notes Named Network or not.

This customer's issue is that after delivery failover has occurred, if the
primary server is re-started, mail will still be routed to the backup server
until an hour after the original failure. From there, it will again try to
connect to the primary server. If it is still not available, mail will still be
delivered to the backup server for another hour. The hour period is the default
for when cost of routing is reset in the Notes routing tables (memory
resident). This time period could be less depending on where in the cycle the
primary server is re-started. For example, if server cost of routing table on a
server that should be connecting to the primary server was refreshed thirty
(30) minutes ago, you should only have to wait another thirty (30) minutes for
correct mail routing to be re-established if the primary server is back up and
running.

This customer's requirement was for the correct re-establishment of mail
routing to occur more frequently.

Solution:

There is a solution to this issue, but it must be carefully considered because
there is the possibility of this solution causing other problems (discussed
below). The solution is to decrease the default cost of routing reset time
using the following NOTES.INI parameter:

MailDynamicCostReset=60

If this parameter value is reduced from sixty (60) (the default) to thirty
(30), the maximum time before a connecting server will try to re-establish the
correct mail route is reduced to thirty (30) minutes. Again, this may depend on
whether the connecting server is in the same Notes Named Network. If it is not,
the Connection document details come into play.

This INI value needs to be changed on any server connecting to the primary or
backup mail servers. The risk in this solution is that if you reduce the value
too far, failover may never occur. For example, you have a server in another
Notes Named Network that has a connection time of every thirty (30) minutes,
but a cost reset value of (twenty) 20 minutes. The connection would not be able
to fail enough to increase the cost of routing to the level where failover
would occur.

It is important to note that reducing the cost of routing reset value could
also have other implications for mail routing if you make use of cost of
routing in Connection documents.

Taking this customer's issue as a whole, the re-establishment of correct mail
routing routes does not actually affect the mail user's access to mail as if
the primary server is re-started. However, the backup server is still receiving
the inbound mail and replication will push this received mail to the mail files
on the primary server. This means that no matter which server the user connects
to they will be able to see all their new mail and send new messages.

Dig Deeper on Domino Resources - Part 7

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.

-ADS BY GOOGLE

SearchWindowsServer

Search400

  • iSeries tutorials

    Search400.com's tutorials provide in-depth information on the iSeries. Our iSeries tutorials address areas you need to know about...

  • V6R1 upgrade planning checklist

    When upgrading to V6R1, make sure your software will be supported, your programs will function and the correct PTFs have been ...

  • Connecting multiple iSeries systems through DDM

    Working with databases over multiple iSeries systems can be simple when remotely connecting logical partitions with distributed ...

SearchDataCenter

SearchContentManagement

Close