Speaker: John Spouge, National Library of Medicine, National Institutes of Health
Title: A Rigorous Statistical Theory for Detecting Repeats In Biological Sequences

Abstract: Repeats in biological sequences are interesting when they cause human disease; annoying when they interfere with statistical inferences about the similarity and common function of two sequences. Their detection is therefore important in modern computational biology. A "simple repeat" consists of a particular word of length w repeated several times without gaps. Combinatorial analysis yields a complete solution for the problem of simple repeats [1]. Unfortunately, biology does not accommodate mathematical convenience, and point mutations in biological sequences usually corrupt individual letters in a simple repeat to form "inexact simple repeats". Fortunately, the theory for Markov additive processes yields explicit analytical results for inexact simple repeats [2,3]. Unfortunately once again, however, mutation corrupts biological sequences by inserting unrelated sequences into inexact simple repeats. This talk gives a theory for the statistics of inexact simple repeats with gaps, again based on Markov additive processes. My preliminaries will discuss the biological relevance of and mathematical approaches to the problem of repeats in sequences, and my summary will discuss their relation to some speculations about the theory of Markov additive processes on countable and uncountable Markov state spaces.

[1] M. Regnier, "A unified approach to word occurrences probabilities" (2000) Discrete Applied Mathematics 104 : 259-280
[2] J.L. Spouge, "Markov additive processes and repeats in sequences"(2007) J Appl Prob 44 : 514-527 [pdf]
[3] J.L. Spouge, "Correction to Markov additive processes and repeats in sequences" (2007) J Appl Prob 44 : 1122 [pdf]

Time: Friday, Apr. 2, 2010, 1:30-2:30 p.m.

Place: Science and Tech I, Room 242

Department of Mathematical Sciences
George Mason University
4400 University Drive, MS 3F2
Fairfax, VA 22030-4444
Tel. 703-993-1460, Fax. 703-993-1491