Allow me to begin by repeating the key point I made in Part 1 of this article. Domino can run for hundreds of days, with no problems at all. I have personally seen this happen many times with Domino servers that I administer. If your Domino server is crashing more than once per quarter, or if crashes/hangs consume a large portion of your time, something is wrong and can be fixed.
Part 1 of this article focused on diagnosing why your server crashed or hung. In Part 2, I'll explain how to recover from the problem, as well as how to prevent it from re-occurring.
Tip # 1. For Windows platforms, power down the server completely and restart it cold. In theory, Windows Server 2003 protects the kernel from errant processes, so that a crash in one process does not destabilize the entire operating system. In practice, we know that this is not true. Just restart the machine.
Tip #2. If you are running Domino as a Windows system service (which is the standard practice), be sure to restart Domino in the same way, rather than manually from the user interface. If you reboot the machine, this should happen automatically, as the system services restart again.
Tip #3. If the server is part of a cluster, set the server to be "restricted" while you perform the steps below. Do this with the console command set config server_restricted=n, where n is 0, 1, or 2. (0 = unrestricted, users can log on; 1 = users cannot log on until the server is restarted; 2 = server is restricted until this parameter is changed.)
Tip #4: Run the FIXUP task on all databases. Also consider running COMPACT as well, although this is more time-consuming. If you know that crashes are occurring only for certain databases, you should definitely run COMPACT on those files.
Tip #5. Look for .TMP files. These might indicate that the server was in the middle of compacting a database when it crashed. If so, you'll need to re-compact this database.
Tip #6. Examine the tasks that the server is running, by looking in NOTES.INI at the ServerTasks line. Remove tasks that are not needed. For example, if the Domino server is not being used as a Web server, you do not need the HTTP task. If the server is not handling LDAP requests, you do not need that task.
Tip #7. If you have any "add-in" tasks on the server, such as third-party products or custom written DLLs, remove these commands temporarily from NOTES.INI. You can restore them when the server is stable. If the server starts crashing after they are restored, you have found the problem.
Tip #8. Consider re-installing Domino on top of itself. This rarely hurts anything, and it just might fix the problem.
Tip #9. For repeated immediate failures, edit NOTES.INI and place an exclamation mark (!) in front of the ServerTasks line, save the file, then reboot. Once the server (hopefully) starts and runs, you can start all the other tasks manually, one by one. In this way, you can find out which task is responsible for your crashes. Also, starting the server in such a minimum configuration gives you the chance to run fixup/compact more easily.
Tip #10. If you had set the server to a restricted state, remember to clear that setting so users can log on again.
Finally, one piece of advice: If your server's operation is mission-critical, consider setting up a cluster (if your server is not already in one). I strongly believe in Domino clustering. It creates a super-reliable computing system, especially if you put the cluster members in separate rooms on separate power circuits.
Chuck Connell is president of CHC-3 Consulting, which helps organizations with all aspects of Domino and Notes.