Hotel Estoril Eden, Monte Estoril,
5-8 October 2005



Information Integration of Biological Data Sources

Mário J. Silva
Universidade de Lisboa, Faculdade de Ciências, Departamento de Informática, Portugal

The integration of biological sources of information has recently become a focus of data management research. Often, biological data structures have a high representational heterogeneity and many data may be absent or contradictory. In this presentation, I will characterize the classes of data sources that have to be integrated in bioinformatics and the challenges ahead for dealing with this kind of scientific data, together with a discussion of the limitations of integration approaches that have been proposed before.

Information integration of biologic data sources frequently involves processing unstructured text data, obtained from the scientific literature. This involves selecting documents to be scanned, mining the texts for extracting relationships between biologic entites and finally combining the information. I will discuss and present in more detail some of the methods that have been recently proposed to handle biomedical texts in information integration. An important recent advance lies in using rules and examples obtained from public curated biologic web sources as training data for the extraction process.