Besides obvious use in surveillance to uncover hidden patterns and possibly identify nefarious activity, the software could be used by commercial organizations to cull e-mails and technician repair notes to identify percolating product quality and safety problems.
UIMA-compliant text analytic components can use WebSphere Information Integrator OmniFind Edition to clarify the meaning of terms, and garner useful business information, suggested Nelson Mattos, vice president of Information Integration, IBM. He described UIMA as a framework composed of software components with well-defined interfaces. These components can serve to identify the language of documents, find words and roots of words as traditional keyword-based engines do, identify parts of speech, extract concepts (or "entities") and recognize relationships.
The components can allow software developers to "plug in industry expertise that will help extract valuable metadata about documents," Mattos said. That helps address the age-old problem of semantics: A term such as "rock" can stand for music, stones or motion, depending on the context that surrounds it. [A short-hand definition for 'metadata,' as used here, is: Data about data.]
"OmniFind can find and understand the semantic meaning of words," said IBM's Mattos. He described UIMA and OmniFind as instances where sophisticated text analysis was going "mainstream."
"Because the system can 'understand' the facts in a document, we can dramatically improve the relevance of results," Mattos said. "That saves time for workers."
He also noted that because components can probe for underlying meaning, the system could, for example, analyze millions of call center records to uncover maintenance contract pricing issue trends.
Asserts Mattos: "It is a tremendous breakthrough in the search space."