If the your browser is internet explorer 5 or any older version, you are only able to read the content of this page, but not to see the layout.

In-Silico Analysis of Proteins

Celebrating the 20th anniversary of Swiss-Prot

July 30 - August 04, 2006 : Fortaleza, Brazil

Poster #RP108

BioMint: a database curator's assistant for biomedical text processing

Anne-Lise Veuthey*, Violaine Pillet*, Marc Zehnder*

*Swiss Institute of Bioinformatics, Geneva, Switzerland

BioMinT is an information retrieval system which helps literature screening for UniProtKB/Swiss-Prot curation. The core of the system consists of a meta-query engine wrapped around PubMed. For genes and proteins, the user can choose the query terms from a comprehensive list of synonyms sorted by species. This functionality uses GPSDB, a database of protein and gene names constructed from model organism resources and containing 559'294 synonyms referring to 292'472 proteins.
The documents retrieved from PubMed are filtered and ranked according to their relevance with regard to the query, and the selected documents are then processed to extract sentences containing information on various topics for database annotation. Currently, classifiers using support vector machine (SVM) have been trained to extract sentences concerning eight different topics. These are protein function, domains, isoforms, subcellular location, tissue specificity, glycosylation and phosphorylation. The algorithms were trained on corpora composed of Medline abstracts with tagged sentences. A total of 18'827 sentences on 21 biological topics relevant for Swiss-Prot annotation have been tagged. These corpora have been made available to the text-mining community (http://biomint.isb-sib.ch/).
BioMinT and GPSDB are available at: http://biomint.pharmadm.com/
The BioMinT project was funded by the European Commission (contract-no. QLRI-CT-2002-02770).