воскресенье, 4 марта 2012 г.

Suprafamilial relationships among Rodentia and the phylogenetic effect of removing fast-evolving nucleotides in mitochondrial, exon and intron fragments.(Research article)(Report)

Authors: Claudine Montgelard (corresponding author) [1,3]; Ellen Forty [1]; V�ronique Arnal [1,3]; Conrad A Matthee [2]

Background

Since the pioneer work of Brandt [1], a wealth of literature has been devoted to suprafamilial relationships among rodents. To date, however, no consensus has been reached based on morphological or paleontological evidence. Nearly a century after Brandt [1], Simpson ([2], p. 197) referred to the order Rodentia and stated that "their relationships are involved in an intricate web of convergence, divergence, parallelism, and other taxonomic pitfalls."

The addition of molecular data contributed significantly in constructing a species tree for the order Rodentia and the most up to date taxonomic arrangement includes at least 2277 species distributed among 33 families and five suborders [3]. Recently Huchon et al. [4] recognized the Laotian rock rat (

Laonastes aenigmamus ) from Laos [5] as an additional family Diatomyidae closely related to the Ctenodactylidae. Despite this new addition, the number of initially recognized rodent families by Simpson [2] and Wood [6] remained fairly stable (for review see [3]). The number of rodent clades identified above the familial level, however, led to numerous inconsistencies and controversies (see [7, 8, 9]). In the present study we adopted the most up to date suprafamilial classification as reviewed by Carleton and Musser [3] who recognize five suborders (Sciuromorpha, Castorimorpha, Myomorpha, Anomaluromorpha and Hystricomorpha).

Hystricomorpha contains 19 families (78 genera and 291 species), and includes the previously problematic Ctenodactylidae [3] and the newly discovered Diatomyidae [4]. The two latter families were identified as the sister taxon of the 17 traditional families comprising the infraorder Hystricognathi [4, 10]. The monophyly of Hystricomorpha is currently supported by morphological, paleontological and molecular data (see review in [10, 11, 12, 13]). Sciuromorpha includes Gliridae, Aplodontidae and Sciuridae. The latter two families are closely related based on hard and soft morphological features [14, 15, 16, 17], albumin immunology [18] and sequence data (for example see [13, 19, 20, 21]). The myomorphous Gliridae is regarded as an early offshoot of Sciuromorpha and this is supported by middle ear anatomy [14], arterial patterns [22]) and previous molecular investigations (for example [19, 21, 23]). Castorimorpha also comprises three families, Castoridae, Heteromyidae and Geomyidae. This association was first suggested by Tullberg [24] and, although not well supported by morphology, has fairly strong molecular support (for example see [13, 19, 20, 21]). The two superfamilies, Dipodoidea and Muroidea (including one and six families, respectively) comprise the suborder Myomorpha and their close affinity is well established (see [3]). The Anomaluromorpha contains Anomaluridae and Pedetidae. Associations between the later two families are strongly supported by mitochondrial and nuclear data [4, 11, 21, 25] and this agrees with Winge [26] and Tullberg [24]. However, a recent paper by Horner et al. [27] based on the coding regions of the mitochondrial genome disagrees with these suggestions and places Anomaluridae (Pedetidae was not included) as a sister taxon of Hystricognathi.

Evolutionary associations among these five suborders are not well resolved [3] and even the monophyly of the order has been questioned in the past based on mtDNA analyses [28, 29]. The notion of paraphyly of the Rodentia, however, was short lived and never supported by morphology and more comprehensive genetic studies [13, 20, 30, 31]. Based on available evidence, Carleton and Musser [3], suggested that Sciuromorpha, Myomorpha and Hystricomorpha are well established while the monophyly and/or phylogenetic position of Castorimorpha and Anomaluromorpha is less secure. Subsequent retroposed SINEs provided additional evidence for the monophyly of Myomorpha, Anomaluromorpha and Hystricomorpha whereas no SINE has been identified for Castorimorpha or Sciuromorpha. A clade including Myomorpha, Anomaluromorpha and Castorimorpha (the "mouse-related clade" as defined by Huchon et al. [20]) was also confirmed by several unique SINE insertions [11, 32]. Unfortunately, no SINE has been found for any relationships among the three members of the "mouse-related clade" (Myomorpha, Anomaluromorpha and Castorimorpha). Finally the phylogenetic relationships among the three major rodent groups: Sciuromorpha, "mouse-related clade" and Hystricomorpha are as yet unresolved.

The introduction of phylogenomics and whole organism genome sequencing (thousands of nucleotides or amino acids), coupled to the use of probabilistic methods based on models of sequence evolution, implicitly led to the belief that inconsistency in tree reconstructions will soon be something of the past. However, it is clear now that increasing the number of nucleotides does not always solve incongruence in phylogenetics [33, 34, 35]. Even phylogenomic reconstructions can result in biases, and as a consequence, produce well supported incorrect tree topologies (for example [33]). In addition, gene tree reconstructions are based on numerous implicit assumptions that are seldom tested (for example gene orthology, reversible time homogeneous substitution process, stationarity of base composition through time). Violations of these assumptions may lead to compositional bias, contrasted patterns of saturation and heterogeneous evolutionary rates among genes and lineages. Current phylogenetic reconstruction methods do not efficiently test and account for such biases, the consequence being reconstruction artefacts such as long branch attraction (see for example [36, 37, 38]). To avoid these pitfalls, some authors [34, 37, 39] emphasize the necessity to test the quality and consistency of the data and recommended that sources of inconsistencies should be excluded (such as fast-evolving or compositionally biased positions). This is more feasible with large datasets because removing a part of the data will theoretically leave enough informative positions to recover confidence and consistency.

The aims of this paper are firstly to test the current phylogenetic hypotheses surrounding the higher level relationships among rodent families. Moreover, by using a large dataset we hoped to decipher remaining unsolved relationships among the five recognized rodent suborders. Secondly, we were particularly interested in comparing the contribution of three different datasets: two mitochondrial genes (Cytochrome

b and 12S rRNA), two nuclear exons (the exon 28 of von Willebrand factor - vWF; exon one of the interphotoreceptor retinoid-binding protein - IRBP) and four nuclear introns (Stem cell factor - MGF; protein kinase C - PRKC; [beta]-spectrin non erythrocytic 1 - SPTBN; and Thyrothropin-THY). For each dataset, we determined the distribution of sites according to eight evolutionary rates and we documented how the removal of the fast-evolving positions influenced phylogenetic reconstructions.

Results

Alignment, partition and heterogeneity of substitution rates

The alignments of the mitochondrial cyt

b and 12S rRNA genes are respectively 1140 bp and 1042 bp long. A total of 56 bp in a loop region could not be aligned for the 12S rRNA fragment and was excluded (positions 933-987). The mitochondrial dataset comprised 2126 bp and was subdivided into 5 partitions: one for each codon position of cytb (380 bp each), and stems (458 bp) and loops (528 bp) for the 12S rRNA region. The two nuclear exons, IRBP and vWF represented 1299 bp and 1272 bp respectively. The resulting 2571 positions have been partitioned into the three codon positions either for each gene separately (3 partitions of 433 bp each for IRBP and of 424 bp each for vWF) or from the 2 genes concatenated (3 partitions of 857 bp each). For the introns (MGF, PRKC, SPTBN and THY), the number of base pairs for the full alignments and those remaining after removal of the poorly aligned positions with Gblocks, together with the number of positions in intronic and exonic regions, are indicated for each gene in Table 1 (also see Additional files 1 and 2 for intron alignment before and after Gblocks). Although the total length of each intron varied considerably between taxa (Table 1), the number of conserved positions used for phylogeny reconstruction was close to the mean length for each fragment. For each gene and each pair of taxa, we graphically compared the p-distances (percent divergence) before and after removal of poorly aligned positions using Gblocks. With the exception of PRKC, the slopes of the regression lines (MGF: 0.89, PRKC: 0.62, SPTBN: 0.83, THY: 0.76) indicated a fairly good correlation before and after the exclusion of poorly aligned regions.Additional file 1: Intron alignment before Gblocks . Sequence alignment of each intron before removal of poorly aligned positions by Gblocks is given in nexus format.Additional file 2: Intron alignment after Gblocks . Sequence alignment of each intron after removal of poorly aligned positions by Gblocks is given in nexus format.Table 1 caption: Intron sequences [table omitted]

The estimated number of sites in each of the eight gamma rate categories for the three main data types (mitochondrial, exon and intron data) is presented in Table 2. Using TREE-PUZZLE the proportion of invariable sites has been estimated to be zero in each case. Thus, invariable positions are all included in the first gamma rate category which encompasses the most sites for the three datasets, especially for the mitochondrial and exon genes (nearly 40% of sites). These latter two datasets show nearly no sites in the rate categories 2 and 3 (0 for mitochondrial genes and 31 for exons) whereas introns show a noticeable homogeneous increase between categories 2 to 7 (between 7.9% and 12.9% of sites). Fastest-evolving sites (category 8) are more numerous for introns when compared to the other two data types (exon and mtDNA). These results indicate that mitochondrial and exonic regions show a similar behaviour in terms of gamma rate distributions and vary greatly among sites: ~40% of the positions were invariable and ~12% reached a very high rate (5.42 and 3.91 for mitochondrial and exon genes, respectively). This heterogeneity is also evidenced in the gamma value of the distribution parameter alpha which varies from 0.20, 0.46 and 2.63 for mitochondrial, exon and intron datasets, respectively. The differences between the fragments sequenced can best be explained by the coding nature of mitochondrial and exon genes when compared to the non-coding introns.

Table 2 caption: Gamma rate distribution for the mitochondrial (mito), exon and intron genes [table omitted]

For the …

Комментариев нет:

Отправить комментарий