Problem solve Get help with specific problems with your technologies, process and projects.

Identifying duplicates in a working group doc database

What's the best way to identify duplicate documents in the working group documents database? The documents arrive in many ways, including forwarded, pasted and created.
I think the real issue is what you consider to be duplicate documents. Let's say, for example, that three different people see one article they think might be of interest to colleagues, and decide to add it to the database.

  • The first person uses the "e-mail a friend" option on the article's Web site to e-mail the link to the database, using the Web site's name as the subject line.

  • Person No. 2 copies the Web page to a Windows clipboard, pastes the contents into a new document, and gives it a subject such as "This is interesting."

  • The third person employs a browser's mail-forwarding feature to forward the entire document, using the article's title as the subject.

    So, although you end up with three documents that might be considered duplicates, there's no easy way to programmatically determine that they are, in fact, duplicates.

    If your situation is a little less free form than this example scenario, you might be able to create a view that sorts documents on a combination of fields and data that, when combined, might constitute duplicate documents. If this is the case, you can then create an agent to process each document in the view, compare it to the previous document's combined data and mark it as a possible duplicate.

    Do you have comments on this Ask the Expert question and response? Let us know.

  • Dig Deeper on Lotus Notes Domino Application Development

    Have a question for an expert?

    Please add a title for your question

    Get answers from a TechTarget expert on whatever's puzzling you.

    You will be able to add details on the next page.

    Start the conversation

    Send me notifications when other members comment.

    Please create a username to comment.




    • iSeries tutorials

      Search400.com's tutorials provide in-depth information on the iSeries. Our iSeries tutorials address areas you need to know about...

    • V6R1 upgrade planning checklist

      When upgrading to V6R1, make sure your software will be supported, your programs will function and the correct PTFs have been ...

    • Connecting multiple iSeries systems through DDM

      Working with databases over multiple iSeries systems can be simple when remotely connecting logical partitions with distributed ...