Our research focuses on one of the most intimate coevolutionary interactions: the interaction between a genome and its intra-genomic parasites, transposable elements. The term "transposable element" (TE) covers an extremely large diversity of mobile sequences that replicate with an RNA intermediate (e.g. retrotransposons) or without (e.g. DNA transposons). TEs have considerably affected the size, function and structure of the genome in al eukaryote lineages. Far from the common assumption that TEs are useless components of genomes, they have been a significant source of evolutionary novelties. It is thus important to understand the evolutionary processes that are affecting their dynamics in natural populations. Our research in this area revolves around two main questions: (1) what is the nature of the interactions between TEs and their hosts? and (2) Why does the diversity and abundance of TEs differ so much among organisms?
Evolution and Impact of L1 Retrotransposons in Mammals
It can be argued that one of the main contributions of the human genome project was the realization that TEs have had a profound and defining impact on the structure of our genome. The dominant category of TEs in mammals, LINE-1 (or L1) has reached extremely high copy numbers in the human genome, possibly accounting for a third of our genome size. Although it was known since the late 1980s that L1 is capable of transposition in modern humans, it was believed that new insertions were produced at an extremely low rate compared to mice and rats, which were known to host very prolific L1 families. In collaboration with Anthony Furano at the NIH, we found that L1 has been accumulating in the human genome at about the same rate per generation as in murine rodents. We also discovered that, in recent human history, L1 diversified into several distinct families, thus demonstrating that L1 is a dynamic and expanding component of the human genome (Boissinot et al. 2000 pdf).
Top identify evolutionary changes in the sequence of L1 that could explain its replicative success, we performed the most comprehensive analysis of L1 evolution in humans, from the origin of primates to present (Khan et al 2006 pdf). We found that L1 elements have frequently recruited novel promoter sequences and that these events correlate with an increase in the rate of amplification. This pattern suggests that L1 elements could escape host-mediated repression of their transcription by recruiting novel promoters. Interestingly, L1 families coexisted for extended periods of evolutionary time only when they have different promoter sequences, raising the possibility that elements with similar promoters could compete for host-encoded transcription factors. Another interesting observation we made was that a region of L1, the coiled coil domain of the first open-reading frame (ORF1), evolved under adaptive evolution and that rapid amino acid changes in this region correlate with an increase replicative success (Boissinot and Furano 2001 pdf; Khan et al 2006). By analogy with the adaptive evolution documented in host-pathogen interactions, we proposed that the rapid evolution of the coiled coil could be the signature of an arms race between L1 and a repressor of L1 transposition.
Another question we have been addressing is the effect of L1 activity on fitness. Using a population genetics approach we have shown that full-length L1 elements, but not truncated ones, were subject to negative selection (Boissinot et al 2006 pdf). Thus, one or more properties unique to full length L1 elements is negatively affecting the fitness of humans. It was however unclear what the basis for the deleterious effect of L1 was since three mechanisms are possible: the direct effect of insertions on gene activity, genetic rearrangements caused by ectopic recombination or the retrotransposition process per se. Using a bioinformatics approach we showed that long elements accumulate in low or non-recombining regions of the genome whereas short elements are more homogeneously distributed. This observation strongly suggests that the deleterious effect of L1 elements results from their ability to mediate ectopic recombination (Song and Boissinot 2007 pdf) and provides a mechanism for the biased distribution of L1 elements.
More recently we extended our analysis to two other mammalian species, the house mouse (Sookdeo et al 2013 pdf) and the horse (in preparation). We shoed that, similarly to human, the mouse and horse L1 has frequently recruited novel promoter sequences and that the simultaneous activity of non-homologous promoters seems to be one of the conditions for the coexistence of multiple L1 families. In contrast to human L1, there was little evidence of rapid amino-acid replacement in the coiled coil of ORF1 in mice and horses, although this region is structurally unstable in these species. The similarities in the mode of evolution of L1 suggests that the nature of the interactions between L1 and its host might be conserved among mammals, yet some notable differences, particularly in the evolution of ORF1, suggest that the molecular mechanisms involved in host-L1 interactions might be different in these three species.
The evolutionary dynamics of transposable elements in non-mammalian vertebrates
Vertebrate genomes vary considerably in size and structure and understanding the cause(s) of these differences is fundamental for meaningfully interpreting genomic annotations. The genome of mammals tend to be larger than the genome of non-mammalian vertebrates and an open question is whether these differences depend on the relative success of TEs at accumulating in their host's genome. To address this issue our laboratory is performing comparative analyses on the dynamics of amplification of TEs among vertebrates.
We first analyzed the abundance and diversity of TEs in the green anole lizard (Anolis carolinensis), the first non-avian reptile to have its genome sequenced, with a focus on retrotransposons. We found that the anole genome contains an extraordinary diversity of elements. Although diverse, retrotransposons in anoles rare found in small number and the vast majority of elements inserted recently (Novick et al. 2009 pdf). Similarly we showed that DNA transposons exhibit an extreme diversity of structure (Novick et al. 2010 pdf), resting from inter-element recombination, incorporation of extraneous DNA sequences and repeated horizontal transfer (Novick et al. 2010 pdf). Like retrotransposons, DNA transposons are predominantly represented by young families, whereas divergent families are exceedingly rare.
The pattern of TE diversity in anoles shows some striking similarity with the genome of teleostean fish, such as the threespine stickleback (Gasterosteus aculeatus), which also contains a large diversity of young TEs found in very small numbers (Blass et al. 2012 pdf). This low abundance but large diversity of TEs in non-mammalian vertebrates contrast drastically with the very low diversity and high copy numbers of mammalian L1 elements, suggesting a fundamental difference in the way mammals and non-mammalian vertebrates interact with their genomic parasites.
The young age of TE insertions in non-mammalian vertebrates is reminiscent of the situation in Drosophila, whose genome hosts a large diversity of TEs which tend to be young and in low number. Theoretical and empirical models posit that copy number in Drosophila results from a rapid turnover of elements in which the insertion of new elements is offset by the selective loss of element-containing loci. Applied to vertebrates, this model predicts that the genetic load imposed by TEs is much heavier in fish and reptiles than in mammals. We examined if it was the case in sticklebacks and anoles using a population genetics approach. Contrary to expectations, we determined that the vast majority of elements were fixed at the level of the species, which is not consistent with the turnover model (Blass et al. 2012 pdf; Tollis and Boissinot 2013 pdf). So, what can account for the lack of ancient elements and the small number of copies in fish and reptile genomes? We addressed this question by examining the pattern of decay of TE insertions. We found that the rate of DNA loss through large deletions is much higher in stickleback and in anoles than in mammals, and can easily account for the lack of ancient copies in these genomes (Novick et al. 2009 pdf; Blass et al. 2012 pdf).
We also tested the impact of host demography on the fate of TE insertions. As TEs are obligatory parasites, any factor that affects the effective population size of the host will affect the fate of insertions. This is because a decrease in effective population size of the host will result in stronger genetic drift and in a less efficient action of selection. Thus, deleterious insertions are expected to be found at higher frequencies in populations of small size than in large populations. We tested this hypothesis in sticklebacks by comparing oceanic populations, which have a large effective population size, and lake populations, which have suffered a population bottleneck (Blass et al. 2012 pdf). We found that full length inserts are more likely to be fixed (or at high frequency) in lake populations, which is consistent with the hypothesis that the strength of purifying selection against deleterious TEs is lower in these populations of small size. In anoles, we found that full-length elements were reaching fixation in populations that underwent a rapid geographic expansion (Tollis and Boissinot 2013 pdf). This can be explained by a phenomenon called allele-surfing whereby drift acts strongly on the small populations on the forefront of the population expansion.