Nina Luhmann (October 2013 - December 2016)
Reconstructing ancestral genomes is a long-standing computational biology problem with important applications to large-scale sequencing projects. Informally, the problem can be defined as follows: Given a phylogenetic tree representing the evolutionary history leading to a set of extant genomes, we want to reconstruct the structure of the ancestral genomes corresponding to the internal nodes of the phylogeny. Global approaches simultaneously reconstruct ancestral gene orders at all internal nodes of the considered phylogeny, generally based on a parsimony criterion. However while complex rearrangement models can give insights into underlying evolutionary mechanisms, from a computational point of view, this problem is NP-hard for most rearrangement distances.
Besides the phylogeny and the extant genome sequences, a third source of data for reconstruction became available recently. Due to the progress in sequencing technologies, ancient DNA (aDNA) found in conserved remains can be sequenced. One example is the genome of the ancestor of Yersinia pestis strains that is understood to be the cause of the Black Death pandemic. However, enviromental conditions influence sources for such paleogenomes and result in degradation and fragmentation of DNA molecules over time. This entails the reconstruction of the genome to be specifically challenging and leads only to a fragmented solution requiring additional scaffolding.
The goal of this project is to integrate new ancient sequencing information in the reconstruction of ancestral genomes. The comparison with extant genomes in the phylogeny can scaffold the fragmented assembly of the aDNA data while offering a lot of questions regarding the modeling of the genomes and the rearrangement model applied. This project aims to develop algorithms addressing these problems in reconstruction and scaffolding with the focus on the fragmented ancient assembly. We developed an extension of the exact algorithm to reconstruct genomes under the Single-Cut-or-Join rearrangement distance. It includes ancient DNA sequencing information in the reconstruction of ancestral genomes and also scaffolds the fragmented aDNA assembly while minimizing the SCJ distance in the tree. We then generalized this result in an approach combining the evolution under the SCJ model with prior information of the genome structure at internal nodes of the tree, e.g. derived from the available aDNA data. We further addressed the problem of closing gaps between assembled contigs in ancient Yersinia pestis genomes, taking advantage of related reference sequences. These are first steps towards an integrated phylogenetic assembly of paleogenomes.
Supervisors: Cedric Chauve (Simon Fraser University), Jens Stoye (Bielefeld University), Roland Wittler (Bielefeld University)