The key point to understand about Domino server crashes is that they don't happen very often. Domino can run for hundreds of days with no problems at all; I have seen this many times with the Domino servers I administer. If your Domino server is crashing more than once per quarter, or if crashes and hangs consume a large portion of your time, something is wrong. This two-part article will help you figure out why your server is crashing...
and how to resolve the problem.
Tip #1. Make sure your server configuration is as "clean" as it can be. You should be using modern, reliable hardware (although a Domino server usually does not need to be the fastest machine available.) You should be running a stable, proven version of your operating system. Reduce the number of items in the Startup menu as much as possible. Do not allow any people to use the machine as a workstation. Do not connect any unusual hardware devices to the server. Do not run any other applications on the machine. (There are two exceptions to this last rule. You might need a backup agent related to your backup software. Also, mainframe-type computers must share processes with other applications.)
Tip #2. Think about whether anything on the server computer has changed. Did you upgrade the operating system recently (such as a new service pack)? Have you installed any third-party products, new drivers or a new network card? If there is a crash, recent changes to a machine that had been stable are the most likely culprits.
Tip #3. Do not allow remote file access to the server disks. No one should be accessing the files on that machine other than the Domino software.
Tip #4. Check IBM's developerWorks page listing technical resources for Lotus software (which used to be known as www.notes.net) to see if there is a point release of Domino that is more recent than the one you are running. (To find this information, scroll down to the Support area of the page.) If so, consider installing it. But be wary of installing any software, including Domino, when it is still new. I advise my clients to hold off on any point release until it has been available for a few weeks. This allows other people to test it out, and you can look on the page above for discussions about any problems.
Tip #5. Look in the server's directory named IBM_TECHNICAL_SUPPORT and find the NSD file, which is created during a server crash. Search the NSD for the word FATAL to find the fatal thread. Send the NSD and log file to Lotus for analysis. If the server is hung, rather than crashed, you might have to run NSD manually after the server is halted. (Do so with nsd.exe on Windows platforms, and nsd.sh on Unix platforms.)
Tip #6. Look in the Domino server log files -- log.nsf, console.log and domlog.nsf -- for further clues about the reason for the crash. You might notice a pattern, such as crashes always occurring within the HTTP task.
Chuck Connell is president of CHC-3 Consulting, which helps organizations with all aspects of Domino and Notes. He thanks Patrick Iannuccilli, Hynek Kobelka and Jeremy Ang for their contributions to this article, via a discussion on the Notes technical resources page mentioned above.