You can retrieve all the data from a Notes document, including information about attachments, embedded objects, images, text styles and so on, using the NotesDXLExporter. You could also do it with the Notes C API, but this is a lot more work, and you would have to convert it to a "flat" format in any case, so why not start with DXL and use XSLT to translate it into whatever representation you want to use? I believe the DXL format does not list file attachments and embedded objects as in-line parts of the DXL of the rich text item, since they are not technically stored in the item but only pointed to by it. But they are in the document DXL and you can extract this information.
You can also copy the contents of a rich text item as one big binary stream by using the "Notes connector" of the LC LSX to read from the Notes database, and insert that data into a BLOB field in a relational database. This just contains the "CD records" from the rich text field; everything that's actually contained in the rich text. Embedded objects and attachments are not contained in the rich text; they are stored in separate $FILE items (actually, to get technical about it, they are stored in a "stored object" repository within the nsf file, and the $FILE items are links to the actual storage). You can extract file attachments using the LC LSX Notes connector, as documented in the Domino Designer help (see "Notes connector properties"), but embedded objects you probably cannot get in this way.
The Midas Rich Text API, available at www.geniisoft.com, may provide an easier way to extract the rich text information you're after. Again, though, it's up to you to encode it in a form that will be useful for you.
Do you have comments on this Ask the Expert question and response? Let us know.
This was first published in November 2004