True Domino Blooper #6: Asleep at the mouse wheel

This Notes admin was asleep at the mouse wheel when he accidentally deleted 300+ documents in his system. It didn't take long before he ran head-on into mob of angry users.

This Notes administrator was asleep at the mouse wheel when he accidentally deleted more than 300 documents in his system. It didn't take long before he ran head-on into mob of angry users.

I used to be a Notes admin working for a large global company with about 14,000-18,000 employees (depending on whose numbers you used) and over 200 Notes servers. I had full manager access to everything.

I was getting ready to clean up some old Person documents for my location (about 400 users) when a friend came into my office. We started chatting about the usual stuff (not work-related, of course.)

Well, I got particularly interested in one story I guess, because I had held down my mouse button. Unbeknownst to me, I had selected the two or three documents I needed along with about 389 other ones starting at the As and working down through the alphabet. Thinking I had the correct ones selected, I clicked the "Delete person" button. Still engrossed in my friend's story, I clicked "Yes" without verifying my selections. Duh!

After a few seconds of deleting, I realized something was wrong and stopped the process. I ended up deleting about 200 people AND submitted AdminP requests to remove their mail files.

Within about three seconds, I started getting phone calls from people that couldn't access their mail. I went into panic mode for a while because AdminP had instantly replicated the mail file removal requests to the regional hub, which was instantly replicating it to the other hubs and so on. I disabled replication on the NAB and quickly copied the missing Person documents from another server and pasted them back into my local NAB so people could get their mail. Lucky for me, the NAB had also replicated back up to the regional hub so people at other sites were having problems sending mail to my users because their person documents were suddenly missing. I had to manually push my NAB to all the major hubs to fix that problem, which took a while, to say the least.

Then I had to deal with the impending AdminP doom. I deleted all the "Delete mail file" requests locally, but the "Delete in access control" requests were spreading to every other Admin request database AND being acted upon immediately. Pretty soon I started getting phone calls from people who were unable to access Notes DBs on various servers around the world.

The other problem was that the environment was so big and so spread out that it took a full 24 hours for all the changes to replicate everywhere in the world. That's including the 20 or sites that were still on a 28.8 or worse dialup. Plus, at the time, I was genuinely concerned that AdminP would automatically remove the mail files, and I was pretty sure that it would remove the Person documents I had just pasted back in due to the long expiration time for requests.

I ended up getting all the AdminP documents deleted before anything worse happened (like delete in Reader/Author fields), but another side effect of this problem was that every server had a FULL replica of AdminP. We're talking about hundreds of thousands of documents here.

I had recommended in that past that each server just contain local docs, but that suggestion was ignored at the time. Now we had sites that were effectively down because all their bandwidth was eaten up replicating AdminP documents. I also had to manually add each one of the users back into the ACLs of every database they were in previously.

The details are painful, and in the end it took over a week for me to correct everything. And I think I blamed somebody else for it anyway!

