Impact of molecular evolutionary footprints on phylogenetic accuracy a simulation study

Date of Award


Degree Name

Ph.D. in Biology


Department of Biology


Advisor: Sudhindra R. Gadagkar


An accurately inferred phylogeny is important to the study of molecular evolution. Factors impacting the accuracy of a phylogenetic tree can be traced to several consecutive steps leading to the inference of the phylogeny. In this simulation-based study our focus is on the impact of the certain evolutionary features of the nucleotide sequences themselves in the alignment rather than any source of error during the process of sequence alignment or due to the choice of the method of phylogenetic inference. Nucleotide sequences can be characterized by summary statistics such as sequence length and base composition. When two or more such sequences need to be compared to each other (as in an alignment prior to phylogenetic analysis) additional evolutionary features come into play, such as the overall rate of nucleotide substitution, the ratio of two specific instantaneous, rates of substitution (rate at which transitions and transversions occur), and the shape parameter, of the gamma distribution (that quantifies the extent of heterogeneity in substitution rate among sites in an alignment). We studied the implications of the following five sequence parameters, individually and in combination: sequence length, substitution rate, nucleotide base composition, the transition-transversion rate ratio and the rate heterogeneity among the sites. It is found that the transition-transversion rate ratio or kappa has a significant impact on phylogenetic accuracy, with a strong positive interaction with accuracy at high substitution rates, contrary to general belief. This work on known expected tree has implications for the researcher in field and would enable them to choose from among the multiple genes typically available today for an accurate phylogenetic inference. DNA sequences diverge from their ancestral sequences by means of evolutionary events (other than mentioned above) such as deletion (deletion of one more nucleotide from a sequence) or insertion (insertion of one more nucleotide to a sequence) events, commonly referreed to as gaps in a sequence alignment. We have also investigated the relationship between the number of gaps and phylogenetic accuracy, when the gaps are introduced in an alignment to reflect indel (insertion/deletion) events during the evolution of DNA sequences. DNA sequence alignments were generated using computer simulation, while varying several sequence parameters and introducing both substitution and insertion/deletion events, along a 16-taxon model tree, and systematically varying the expected proportion of gapped sites. The resulting alignments were subjected to commonly used gap treatment methods and methods of phylogenetic inference. The results showed that in general, there is a strong almost deterministic relationship between the amount of gap in the data and the level of phylogenetic accuracy, when the amount of gap was high. Our results also suggest that, as long as the gaps in the alignment are a consequence of indel events in the evolutionary history of the sequences, the accuracy of phylogenetic analysis is likely to improve if alignment gaps are categorized as arising from insertion events or deletion events and then treated separately in the analysis and if the phylogenetic signal provided by indels is harnessed, for example, by treating the gaps as binary characters in Bayesian or Maximum Parsimony analyses, or in an integrated manner along with substitution events.


Phylogeny, Evolution (Biology)

Rights Statement

Copyright 2009, author