Poster #RP108
BioMint: a database curator's assistant for biomedical text processing
Anne-Lise Veuthey*, Violaine Pillet*, Marc Zehnder*
*Swiss Institute of Bioinformatics, Geneva, Switzerland
BioMinT is an information retrieval system which helps literature screening for UniProtKB/Swiss-Prot curation. The core of the system consists of a meta-query engine wrapped around PubMed. For genes and proteins, the user can choose the query terms from a comprehensive list of synonyms sorted by species. This functionality uses GPSDB, a database of protein and gene names constructed from model organism resources and containing 559'294 synonyms referring to 292'472 proteins.
The documents retrieved from PubMed are filtered and ranked according to their relevance with regard to the query, and the selected documents are then processed to extract sentences containing information on various topics for database annotation. Currently, classifiers using support vector machine (SVM) have been trained to extract sentences concerning eight different topics. These are protein function, domains, isoforms, subcellular location, tissue specificity, glycosylation and phosphorylation. The algorithms were trained on corpora composed of Medline abstracts with tagged sentences. A total of 18'827 sentences on 21 biological topics relevant for Swiss-Prot annotation have been tagged. These corpora have been made available to the text-mining community (http://biomint.isb-sib.ch/).
BioMinT and GPSDB are available at: http://biomint.pharmadm.com/
The BioMinT project was funded by the European Commission (contract-no. QLRI-CT-2002-02770).
