If the your browser is internet explorer 5 or any older version, you are only able to read the content of this page, but not to see the layout.

In-Silico Analysis of Proteins

Celebrating the 20th anniversary of Swiss-Prot

July 30 - August 04, 2006 : Fortaleza, Brazil

Poster #RP101

Analyzing the Effects of Generalizations Implicit within the BLAST Algorithm

Mileidy Gonzalez*, Stephen J. Freeland*

*University of Maryland Baltimore County, Baltimore, United States

Pairwise sequence comparison is one of the most widely used techniques of modern biology. Since it can be used to establish homology, its use is implicit within many fundamental bioinformatics techniques (e.g. phylogenetic tree construction, genome annotation, threading, etc.). Although the use of homology searching programs like BLAST has become extremely widespread, performance has rarely been quantified by anyone other than the developers themselves. This history has left several key generalizations, implicit within the BLAST algorithm, largely untested. To address this knowledge gap, we mapped the entire network of mathematical functions used by the BLAST algorithm to highlight the multiple points where assumptions (e.g. of standard amino acid composition, alignment length, etc.) are embedded within the derivation of significance scores. We use this knowledge to inform a series of tests that measure the effects produced by varying these assumptions in terms of BLAST performance. We then identify the specific properties of searches that will most likely compromise BLAST search efficiency and provide a preliminary formula to estimate the effect for any query sequence. While these findings evaluate the potential for 'unreliability' of results, they also inform us of possible solutions to address current limitations of sequence comparison methods.