Genomic Patterns of Quercus Iobata
Sorel T Fitz-Gibbon Dept of Ecology and Evolutionary Biology, University of California, Los Angeles
Shawn Cokus, Dept of Molecular, Cell, and Developmental Biology, University of California Los Angeles
Matteo Pellegrini, Dept of Molecular, Cell, and Developmental Biology, University of California Los Angeles
Victoria L. Sork Dept of Ecology and Evolutionary Biology, University of California, Los Angeles
DNA sequences of the genome of any organism provide a signature of the evolutionary history of that species. We have sequenced, annotated and analyzed the genome of the California endemic Quercus lobata (Valley Oak) and compared our results to the published genomes of two additional oaks, Quercus robur and Quercus suber and to the published results of an analysis of more than thirty genomes from across the phylogenetic range of angiosperms. Genomes of many plant species, such as the well-studied Arabidopsis thaliana (thale cress) and Solanum lycopersicum (tomato), have regions around the centromere with very low gene density, while the chromosome arms include tightly packed genes. In contrast, we found that oak genomes are more like grass genomes with gene density distributed more evenly along their chromosomes and large intergenic repeat rich regions throughout the chromosome arms. This pattern is also reflected in the methylation patterns across chromosomes which suggest broad distribution of intergenic heterochromtin. We also found very high peaks of methylation (mCHH) at the boundaries of genes, a trait suggested in maize to prevent highly expressed genes from triggering expression of neighboring, potentially parasitic, transposable elements. Among the angiosperms we compared, oaks are at the extremes in having both large intergenic regions and strong methylation peaks at gene boundaries. We suggest these two patterns may have played a role in the ability of oaks to spread to a wide range of habitats and contributed to their high number of species. We also explore potential roles of large numbers of protein coding genes containing a single DUF247 domain. Many of the greater than 150 DUF247 oak genes are arrayed in blocks of up to 50 near neighbors. DUF247s and other similarly distributed genes have been implicated in various reproductive mechanisms including self incompatibly and reproductive isolation. We explore potential roles for oak DUF247s.