Deconstructing Phylogenetic Reconstruction: Effects of Assumption Violations on Evolutionary Inference
Bryan Kolaczkowski
Committee: John Conery (chair), Joseph Thornton (chair), Michal Young, Patrick Phillips
Dissertation Defense(May 2024)
Keywords:

Knowing how organisms are related evolutionarily is crucial for interpreting nearly all biological results. Evolutionary history is inferred using computational techniques that make simplifying assumptions about the evolutionary process. There is ample biological evidence that many of these assumptions are routinely violated, but little is known about the effects of assumption violations on phylogenetic inference.

Here I show how site-specific changes in evolutionary rates — an important. evolutionary feature not incorporated into phylogenetic models — can cause existing meth­ods to produce incorrect results. I develop a mixed branch length technique that produces more reliable inferences under realistic conditions. I outline a strategy to reduce the commputational demands of the mixed branch length model by code optimization and algorithm improvements.

Biologists also want to assess the confidence they should have in inferred phylogenies. Bayesian methods calculate posterior probabilities — i.e. the probability that a hypothesis is correct given the data, model, and prior probability distributions over model parameters — for phylogenetic hypotheses, producing an intuitively meaningful measure of statistical confidence, but concerns that posterior probabilities may reg­ularly be too high has hampered acceptance of phylogenies produced using Bayesian methods. Understanding if, when, and why posterior probabilties are inflated is a crucial problem.

Here I show that alt.hough posterior probabilities are by definition correct assess­ments of subjective confidence given prior assumptions, they are accurate statements of objective confidence only when branch lengths arc known in advance. When branch lengths are unknown, posterior probabilities can be either higher or lower than the long-run chance a hypothesis is correct. Posterior probabilities reported on actual phylogenies should therefore be interpreted only from a subjectivist standpoint.

My results suggest that phylogenetic techniques can produce incorrect phylogenies and assessments of statistical confidence due to assumption violations. Incorporating knowledge of how evolution works at the biological level into phylogenetic models can improve the quality of evolutionary inferences. The mixed branch length model incorporates an important feature of molecular evolution, potentially generating more accurate phylogenies than existing techniques.

This dissertation includes both my previously published and my co-authored ma­terials.