Cluster performance tuning allows you to set the point at which a server considers itself "too busy" and sends users to another server. While this is a simple theory, putting it into practice is quite complicated and has driven many administrators to distraction. A quick search for "cluster and failover" at www.lotus.com/ldd/46dom.nsf will show what I mean. This tip is a primer on cluster tuning and should demystify some of the confusion about this topic.
The Availablility Index (AI) is a measure of how available a Domino server is for more work. The range is 0 to 100%, with a higher number meaning that the server is "more available/less busy".
Domino R5 and R6 use different algorithms to calculate AI, so you cannot compare the two numbers you get from similar servers running the two software versions. In particular, R5 servers often showed AI of 90% or more, while the same machine, after upgrade to R6, shows 50% or less. This can lead some administrators to believe that R6 has slowed down their servers, which is not the case. The AI is just calculated differently.
In R5, the INI variable Server_Transinfo_Normalize allows you some control over how AI is calculated. The use of this variable was problematic however, because you had to set it to a very large number on fast servers. The variable is obsolete for R6.
In R6, Availability Index calculates by taking samples of various server transaction times and averaging them. The gathering of the samples is controlled by the INI variables Server_Transinfo_Max and Server_Transinfo_Update_Interval, which you rarely need to change. The best-case average over a period of time is stored. As the server runs, new measurements are made of current transaction times, and a new response time average is compared with the best-case time. If the current time is equal to the best-case time, the server is considered fully available and AI is 100%. If the current time is longer than the best-case time, the server is becoming busy. If the current time is much longer than the best-case time, the server is very busy and AI approaches 0%.
The server Expansion Factor (EF) is the degree to which current response time is longer than best-case time. For example, suppose that the best-case average for the sample transactions is 10 milliseconds. If the current average for the same transaction set is 20 milliseconds, EF is 2. If the current average is 100 milliseconds, the EF is 10.
The Availability Index is based on the Expansion Factor. An EF of 1.0 is considered to be an AI of 100%. This means that current transactions times are equal to the best-case sample and the server is fully available. At the other end of the scale, an EF of 64 is considered to be an AI of 0%, because transactions are taking 64 times longer than best case.
You may notice one glitch in this algorithm for calculating AI, however. For fast servers, an Expansion Factor of 64 may not be so bad. Suppose the best-case transaction sample takes 50 microseconds, and the same sample now takes five milliseconds. EF is 100, but the server is probably still performing well. For this reason, AI on fast servers can be artificially low, or may jump around wildly as response times vary by tiny amounts. The solution is to use a new INI variable in R6, which allows you to adjust the way that AI is calculated from EF. You can, in effect, stretch out the scale so EF has to be much higher before AI is considered 0%. You do this with the INI variable Server_Transinfo_Range.
The default value for Server_Transinfo_Range is 6. The Expansion Factor that is equal to an Availability Index of 0% is calculated by raising 2 to the number contained in this variable. Since 2**6 = 64, the default "fully loaded" Expansion Factor is 64. If you set Sserver_Transinfo_Range to 7, the Expansion Factor must be 128 before the server has an Availability Index of 0%. If you set the variable to 10, EF must be 1024 for AI to be 0%.
In the experience of my customers, Server_Transinfo_Range = 10 is a reasonable value for modern mid-power servers. (I would be interested in hearing from other people about their experience adjusting this parameter for various types of servers.)
The server Availability Threshold (AT) is the point at which the server is considered "too busy" and new users are redirected to another cluster server. For example, if AT is 70 and AI is 80, users will not fail over to another server because this server is still available for them. If AI falls to 50, new users will fail over to another cluster member because AI is now below the threshold of 70. You control AT with the INI variable Server_Availability_Threshold.
A few other pointers, which may be helpful:
- You can see the availability statistics discussed here from the Domino Administration client. Go to Server -> Statistics -> Server (for R6), or type "show stat server" on the Domino server console.
- You should use the Domino Administration client to set these INI variables, rather than hard-coding them into the server's notes.ini file. It is much easier to see and maintain a variable if it is visible from the Admin client. Go to Configuration -> Server -> Configurations (for R6).
- If you want to use one cluster member as a "primary server" and the other cluster members as "hot backup" machines, you should set AT to zero for the primary machine. This means that users will never be redirected off the primary server, unless it is completely down. This is the default setting for Server_Availability_Threshold.
For more information, see Domino Administration 6 Help -> Contents -> Clusters -> Managing -> Balancing.
Chuck Connell is president of CHC-3 Consulting, which helps organizations with all aspects of Domino and Notes.
Do you have comments on this tip? Let us know.
Please let others know how useful it is via the rating scale below. Do you have a useful Notes/Domino tip or code to share? Submit it to our monthly tip contest and you could win a prize and a spot in our Hall of Fame.