Manage Learn to apply best practices and optimize your operations.

Readers: Domino stability should be tied to server platform readers say that metrics measuring Domino server crashes are useful only when tied to the server platform -- and then provide real-world examples.

The current poll on asks readers how often their Domino server crashes. Several readers have written in to say that the poll would be more useful if it tied crash frequency to platforms.

One of those readers was Don Newman of Yorktown Technologies, who provided some interesting examples of just how much Domino crashes were platform-related. "In over 12 years [of running Domino], I never had a crash on Linux or AIX," he wrote. Even Domino 3.2 on OS/2 "ran for over two years without a reboot," he noted.

Newman currently runs Domino 6.5.4 and 7.0 on Linux. He also has a single NT server that he says crashes at least once every three months if it isn't rebooted regularly.

"The only reason we have NT is because we're running Sametime," he said in a follow-up phone call. The problems with NT have nothing to do with Domino, he added, noting, "With NT, if you don't reboot every few weeks, it's going to crash. Years ago I had an NT desktop, and if it didn't get rebooted every few weeks, it would crash."

Newman was not the only reader reporting problems with NT. Pierre Clerc wrote in an e-mail, "Our Domino server itself has never crashed. But when Domino was installed on a NT server (with file sharing for Office documents), this crashed (for non-Domino reasons) twice a week." His company now runs Domino on a SUSE Linux 8.2 server, and, he notes, "In 14 months, this server (with the same file-sharing on Samba) has never crashed."

Problems with running Domino on the AS/400 were reported by Steven Rieger, manager of information systems, PSC Group, LLC. "I am a HUGE Domino fan," he wrote in an e-mail. "That being said, we push Domino to its limits here at my office and we crash it plenty. Domino on the AS/400, while a great combination for some environments, is not great for Web applications using HTTP and DIIOP [Domino Internet Inter-Orb Protocol]. There are memory leaks and issues when writing portlets for WebSphere Portal Server accessing Domino data. These cause the Domino server to crash and the portal server to run very slowly."

In a subsequent phone call, Rieger told SearchDomino that the company had resolved its Web performance issues by going to Windows servers. But like other administrators, he also found the Windows platform unstable. "We have to reboot our servers weekly to keep the Domino servers from crashing," he said. "Now are these issues Domino faults or OS faults? I say both."

Rieger went on to explain his point. "We were doing a project where we were shutting down Domino and Java, and the memory was not being released, so we knew it was an OS issue. OS developers need to realize that Domino developers were being forced to write debug code and performance utilities that can see memory loss. If the Domino developers work with the Windows developers, the Linux developers, the AS/400 developers, etc. to find the API calls, everyone will benefit."

No crash, just hanging

Carsten Hellweger e-mailed SearchDomino from Germany and also said that heavy-duty Web apps can be tough on Domino. "We love Domino, but it is not perfect yet, [particularly] in respect to Web server and Java Virtual Machine integration." He said he runs his Domino server as a mail and application server which also contains a small Web application, and it has never crashed. But when it comes to the servers of his customers who have deployed heavy-duty Web apps with many LotusScript and Java agents running, it's a slightly different story. "Those servers seldom crash but they hang frequently when Java agents are running," he wrote. "That is annoying, because you can't have the agent scaled and tell it, 'If you run, run slow and keep the other users untouched.' In the end, one user is waiting for a report to come out and all others are waiting with him. But if Lotus/IBM can get that straightened out, it will be an unbeatable system again."

On the subject of crashes, reader Milt Jones had this to say via e-mail: His Domino servers crash every one to two months, usually because a scheduled task does not shut down appropriately. After several weeks, these tasks build up and cause other tasks to fail. More than half of the server's functionality will still be working, but getting it 100% functional requires a reboot (which he noted means that killing and restarting individual processes has not restored function.)

Graceful crash

"This is, in my book, a crash -- albeit a graceful one," wrote Jones. "The superiority of Domino lies not only in its ability to function for long periods of time compared to other mail systems, but also its ability to limp along even when the server is hurting. But perhaps its greatest strength is the fact that it cleans itself up so nicely when there is a crash. Any one who has had a serious problem with Exchange knows that it can be down for extended periods of time while cleaning up serious database corruption. My worst Domino corruption took two hours off-line to fix. My worst Exchange corruption was three days of off-line, non-stop, command line utilities. You can guess what each user community thought during those times."

Finally, Domino architect David Hablewitz e-mailed us to say that when it comes to how frequently Domino servers crash, "many people are just guessing." He suggested we use this opportunity to tell administrators about a "great" tool that will let them know exactly how often it happens. The tool is called MTBF (Mean Time Between Failure) and it can be downloaded from the developer works sandbox.

Note: MTBF exists as an unsupported download, but an IBM spokeswoman said the company "has internal resources available to help users deal with any problems they run into." There is no timeframe, she added, for when this tool might be included in Domino. "For now, since the tool is free and customers are comfortable with it, we will continue to support as is."

Finally, one last tip on the subject of server crashes. In June, ran an article Six tips to help you diagnose why your Domino server crashed. Tip #3 in the article read: "Do not allow remote file access to the server disks. No one should be accessing the files on that machine other than the Domino software."

Reader George Paglia e-mailed us to ask if that included XCOPY. Here is the response of the article's author, Chuck Connell: "Yes. Suppose XCOPY overwrites one of the Domino database (NSF) files. You will corrupt the database and crash many users."

Dig Deeper on IBM Lotus Messaging and Collaboration Servers

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.




  • iSeries tutorials's tutorials provide in-depth information on the iSeries. Our iSeries tutorials address areas you need to know about...

  • V6R1 upgrade planning checklist

    When upgrading to V6R1, make sure your software will be supported, your programs will function and the correct PTFs have been ...

  • Connecting multiple iSeries systems through DDM

    Working with databases over multiple iSeries systems can be simple when remotely connecting logical partitions with distributed ...