A system for determining the statistical significance of the frequency of short DNA motif matches in a genome an analytical approach
Date of Award
2011
Degree Name
M.S. in Computer Science
Department
Department of Computer Science
Advisor/Chair
Advisor: Sudhindra Gadagkar
Second Advisor
Advisor: Jenifer Seitzer
Abstract
A problem in biology arises in the evaluation of statistical significance of the observed frequency of candidate transcription factor binding site matches (To) in a genome. This is because possible overlaps in the genome render the usual chi-square test unsuitable. In this study, we develop generalized models for evaluating the expectation and variance of T over a variety of probability spaces of randomly occurring sequences of elements (or symbols), which can then be used to perform a Z test. In addition, a software toolset in Java was developed to implement basic tools for manipulating molecular sequences along with code for implementing the discovery algorithm and the statistical tools for each of the probability models considered. These Sequence tools are then included in a proposed design to develop a workbench to discover sequence motifs in a genome.
Keywords
Nucleotide sequence Statistical methods Computer simulation, Genomes Models Computer simulation, DNA Models Computer simulation
Rights Statement
Copyright © 2011, author
Recommended Citation
Pfeiffer, Philip Edward, "A system for determining the statistical significance of the frequency of short DNA motif matches in a genome an analytical approach" (2011). Graduate Theses and Dissertations. 353.
https://ecommons.udayton.edu/graduate_theses/353