GENERAL EVOLUTIONARY BIOINFORMATICS
The influence of natural selection on the evolution of the genome at the DNA sequence level, genome architecture, and the expression of the genome into a phenotype has long been a matter of controversy. Now, in the post-genome era, the data needed to rigorously settle the controversy are becoming available. The hypothesis that the evolution of protein-encoding sequences has been influenced by natural selection is currently much more widely accepted.
However, considering that the genomes of eukaryotes, such as fly and human, are composed mostly (90-95%) of non-coding sequences, often termed ‘junk DNA,’ even the insight that natural selection affects protein-coding sequences excludes the vast majority of the genome.
Ongoing projects use available genome sequence data to investigate an array of generally evolutionary genomic questions. For example, we have inferred the importance of non-coding sequences of Drosophila from the action of selection. Thus, we need to reconsider a role of adaptive evolution not only for protein coding sequences, but also for the rest of the genome. In a series of studies on the Drosophila genome we further documented selection on non-coding sequences.
Another analysis of the Drosophila genomes identified genome architectural features as correlates of the functional attributes and molecular evolution of genes. We showed that genes that correspond to the stereotypical gene architecture as typically depicted in textbooks are in the vast minority in the genome and, thus, that models describing the evolution of genes need to consider an array of genome architectural properties, such as physical gene overlap.
We continue to take advantage of the rapid accumulation of genomic data to study the role of natural selection on the genome at various hierarchical levels of its organization and expression as phenotypes. We have been involved in studies of Arabidopsis and human. Increasingly we will focus on the analysis of available rodent genome sequences, or generate our own such genomic data for rat species currently not under consideration for genome sequencing by others.