Thus far, we’ve been typically dealing with evolutionary processes that occur over short timescales – mostly not more than a few hundred generations. But this generational scale of evolutionary change pales in comparison to the vastness of geological time. In this lab, we will explore the evolutionary history of clades through tens (sometimes hundreds) of millions of years, via the major source of direct evidence on deep time: the fossil record.
For this tutorial, you must install the palebioDB
R
package.
Then, load all packages required into your workspace:
We’ve downloaded a fossil dataset for you to work with today
(dinos_fossil
). All occurrence data were originally
obtained from the Paleobiology database, a free-to-use resource that is
the standard repository for such data and which has been used by
hundreds or thousands of researchers to study the history and dynamics
of biodiversity in deep time.
We kept as many occurrences as possible from the original datasets, but removed some extraneous data (e.g. notes on preservation, notes on the paleoenvironment, the original reference for the data, etc.) to make them easier to use. We also removed occurrences with poor time resolution (i.e., the timespan between the maximum and minimum ages of the strata associated with that fossil), and we removed all fossil occurrences not identified to the species level.
To illustrate what these data look like, let’s read and view part of the dinosaur occurrence data. First, we’ll read the file:
Then, we’ll check the basic properties of the dinos
dataframe:
## [1] 15527 13
You can see that there are 15527 rows (each row is an occurrence) and 13 columns (each column is an attribute of each occurrence).
So what are these columns?
## [1] "phylum" "class" "order" "family"
## [5] "genus" "species" "early_interval" "late_interval"
## [9] "max_ma" "min_ma" "midpoint" "lng"
## [13] "lat"
It is always good to check the rows and columns (as we just did) to verify that data were correctly loaded.
In our data, we see columns for:
phylum
,
class
,
order
,
family
,
genus
,
species
= The taxon associated with that
occurrence for that hierarchical level.
early_interval
,
late_interval
= The name of the early
(furthest from present) and the name of the late (closest to present)
intervals associated with that occurrence. Some of those intervals
appear in the GSA geologic timescale (link
here).
min_ma
,
max_ma
= The maximum (oldest; farthest in
time from the present) and minimum (youngest; closest to present) ages
associated with that occurrence. Note that these ages are in units of
millions of years ago (abbreviated to “mya”), in absolute time.
midpoint
= The date (in millions of
years ago) of the exact halfway point of the interval the fossil was
found in. We will explain this variable a bit further in this
tutorial.
lng
, lat
= The longitude and latitude of the location the fossil was found in.
These coordinates are for the fossil’s location in the present.
The “paleo” latitude and longitude (which we removed prior to loading
the data in the package) of each fossil occurrence would be an estimate
of the location where that organism actually lived, given what we know
about plate tectonics and the age of the fossil.
More on how fossil dating works (i.e., how we can determine
min_ma
and
max_ma
) and its limitations are in the
technical note “Dating fossils in absolute time” at the end of this
tutorial.
Now, using the function head()
, we can view the first
n
rows of the dataset:
## phylum class order family
## 1 Chordata Ornithischia NO_ORDER_SPECIFIED Chaoyangsauridae
## 2 Chordata Saurischia Avetheropoda NO_FAMILY_SPECIFIED
## 3 Chordata Saurischia Avetheropoda NO_FAMILY_SPECIFIED
## 4 Chordata Saurischia Avetheropoda Tyrannosauridae
## 5 Chordata Saurischia Avetheropoda Tyrannosauridae
## genus species early_interval late_interval
## 1 Chaoyangsaurus Chaoyangsaurus youngi Late Tithonian Valanginian
## 2 Protarchaeopteryx Protarchaeopteryx robusta Late Barremian Early Aptian
## 3 Caudipteryx Caudipteryx zoui Late Barremian Early Aptian
## 4 Gorgosaurus Gorgosaurus libratus Late Campanian
## 5 Gorgosaurus Gorgosaurus libratus Late Campanian
## max_ma min_ma midpoint lng lat
## 1 150.8 132.90 141.85 123.9667 42.93330
## 2 130.0 122.46 126.23 120.7333 41.80000
## 3 130.0 122.46 126.23 120.7333 41.80000
## 4 83.5 70.60 77.05 -111.5287 50.74073
## 5 83.5 70.60 77.05 -111.5493 50.73701
You will notice that the head()
output wraps around your
console output screen because it is too wide to be displayed in one
viewing frame. The functions dim()
,
colnames()
, and head()
are useful for
inspecting the datasets that you’ve loaded.
The exercises below use the dinosaur data for illustrative purposes, but the functions we’ve provided (and associated exercises) will work for any of the other datasets (trilobites, ammonites, mammals).
As a warm-up, let’s count how many occurrences there are for a famous dinosaur genus, like Majungasaurus. An interesting quick note here is that there is a Majungasaurus full skeleton reconstruction in the University of Michigan Museum of Natural History that anyone can visit.
## [1] 25
Now let’s try its more famous relative, the T-rex. How many occurrences of T-rex are there in our dataset?
## [1] 70
Important to consider: most of what we know about the T-rex comes from these 70 fossil specimens!
When we are able to date both time boundaries of a rock section, we can determine the midpoint age of that rock section, which marks the exact halfway point of the interval.
To illustrate this, we will take a closer look at the fossil occurrences of Tyranossaurus rex. First, let’s make a copy of these occurrences and assign them to a new R object.
Trex <- dinos_fossil[dinos_fossil$species == "Tyrannosaurus rex",]
#this still has the same columns as the dinosaur dataset:
colnames(Trex)
## [1] "phylum" "class" "order" "family"
## [5] "genus" "species" "early_interval" "late_interval"
## [9] "max_ma" "min_ma" "midpoint" "lng"
## [13] "lat"
Now, let’s look at these occurrences.
In the above plot, each horizontal black line segment represents a fossil occurrence. The leftmost end of the line segment represents the maximum (oldest) estimated age of that occurrence, and the rightmost end of the line segment represents the minimum (youngest) estimated age of that occurrence. Thus, each segment captures the full temporal range of possible ages for that fossil occurrence. Because there are 70 total occurrences for T-rex, there are 70 line segments.
Looking at these occurrences, do you think that T-rex actually appeared at the maximum age of the oldest occurrence (the bottommost line segment that spans from 83.5 Ma to 70.6 Ma)?
Instead of plotting the early and late boundaries of the stages in which a species occurs, we can get a more reasonable duration for the T-rex by considering just the time spanning the earliest and latest midpoints of all of the occurrences.
Let’s re-draw the fossil record of the T-rex, but this time let’s only use the midpoints of the fossil occurrences to see if we can get a better idea of the actual duration of this species.
plotRawFossilOccs(Trex, use.midpoint = F, knitr = T)
segments(x0 = min(Trex$midpoint),
x1 = max(Trex$midpoint),
y0 = 35, y1 = 35, lwd=3, col="red"
)
What do you think? Is this a better approximation of the length of time in which T-rex lived (its “temporal range”)? This is not a graded question, but discuss this with your fellow students or your GSI for a moment.
There are many ways to try to reconstruct ages or temporal ranges of species from the fossil record. All of those reconstructions will present some amount of error, and this error will be smaller for fossil records that have better time resolution. The main caveats of these techniques are:
Problems associated with the midpoint technique can be gleaned from comparing panels I and II in figure 1.
For the purposes of this lab, we will assume that reconstructing ages using the interval between midpoints, as we did above for the T-rex, is good enough.
Now we will plot ALL of the dinosaurs in our dataset, estimating their temporal range (with error) with their interval midpoints. Each species will be a horizontal line in this graph.
How will the record look if, instead of looking at the species level, we plot the temporal ranges at the taxonomic level of family (even having all the uncertainties mentioned above)?
What about the taxonomic order level?
And, maybe more interesting: why do so many species (and, an even higher proportion of higher taxonomic levels) disappear abruptly around 66 Mya?
The K-Pg mass extinction was caused by the collision of a bolide with Earth. We can date this event relatively precisely because the magnitude of the impact left a very conspicuous layer of iridium (a mineral that is rare on Earth, but very common in meteors) throughout our planet. Because of these characteristics, we can securely conclude that the K-Pg mass extinction occurred 66 million years ago. We can annotate the K-Pg mass extinction event in the plot we made by using the function:
abline(v = 66)
Did the K-Pg mass extinction impact diversity at the species, genus, and family levels equally, or do we see different patterns of diversity loss among these taxonomic levels? If the pattern is different, might this have some biological meaning?
One question we might make is: what is the temporal pattern of fossil preservation? Does it change across millions of years? We can quickly (and very superficially) test this by plotting a histogram of fossil occurrence midpoints.
hist(dinos_fossil$midpoint,
breaks=100, #increasing breaks to see data in a better precision
xlab="Absolute time (Mya)", main = "Fossil Occurrence midpoints",
#reversing axis to represent past -> present
xlim=c(rev(range(dinos_fossil$midpoint)))
)
Paleontologists call this increase in fossil preservation towards the present the “pull of the recent”.
It is reasonable to imagine that not every species that has ever existed left a fossil. How can we quantify the fraction of species that become fossils? To estimate this, we will look at the number of current dinosaur species (a.k.a. birds; species list from Jetz et al, 2012) that have left at least one fossil occurrence.
# first we load the species list:
data("birds_spp")
# then we count the number of extant dinosaurs within our fossil dataset:
n_fossilized_dinos <- sum(birds_spp %in% dinos_fossil$species)
#then we calculate the proportion:
n_fossilized_dinos/length(birds_spp)
## [1] 0.1139798
We can use two simple methods to calculate diversity curves in the fossil record (see fig 1-III):
The standard method, which simply sums up taxa that occur in the same interval, and thus calculates the diversity within each interval.
The range-through method, which only sums up species that are observed before and after a given stratigraphic interval, and thus calculates the diversity at interval boundaries. This method guarantees we are measuring diversity that co-occurs in time.
Both methods interpolate (“fill in”) gaps in a species’ fossil record between occurrences, by counting a species as being present in any interval that is between the first and last fossil occurrences (FAD and LAD, respectively). Meaning: an interval that does not have an occurrence, but that is between the first and last intervals containing occurrences, is counted as containing that species; see taxon D in Figure 1 for a visual example.
The function calcFossilDivTT
do these calculations for
you, and you can use the argument method
to decide which
method will be employed.
# This line calculates diversity through the standard method:
SM <- calcFossilDivTT(dinos_fossil, method = "stdmethod")
# and this one calculates diversity using the range-through method:
RG <- calcFossilDivTT(dinos_fossil, method = "rangethrough")
# Now we can create an empty plot to store our estimates:
plot(NA,
xlim=rev(range(c(RG$age, SM$age))),
ylim=range(c(RG$div, SM$div)),
xlab="Absolute time (Mya)",
ylab="Diversity"
)
# and then add the standard method diversity:
lines(x=SM$age, y = SM$div, type = "l", lwd=2)
# and finally add the range-through diversity:
lines(x=RG$age, y = RG$div, type = "l", col="blue", lwd=2)
As you can see, even when we calculate diversity through corrected methods, the pull of the recent still distorts our perception of diversity trends: the number of bird fossils increases our diversity line so much that it limits our ability to visualize changes deeper in the past. We can account for this in two ways.
The first and simpler way is to manually restrict our y-axis. We will also add a mark representing the time of the K-Pg mass extinction with a red vertical line in the following plot.
#Creating a plot:
plot(NA,
xlim=rev(range(c(RG$age, SM$age))),
ylim=c(0,700), # here is the part of the code I changed, in
# comparison with the former chunk of code
xlab="Absolute time (Mya)",
ylab="Diversity"
)
# adding the standard method diversity:
lines(x=SM$age, y = SM$div, type = "l", lwd=2)
# adding the range-through diversity:
lines(x=RG$age, y = RG$div, type = "l", col="blue", lwd=2)
# Here, red line marks the K-Pg mass extinction:
abline(v=66, col="red")
Instead of restricting our y-axis, we could use the log scale to highlight proportional changes in diversity. In the below plot, we will plot species richness in log base 2. This means every increment of 1 unit on the log scale equals doubling the number of species, and each unit decrease on this scale equals halving the number of species.
# Plotting again our diversity curves (note
# this is exactly what we did in the previous section), but in log scale
plot(NA,
xlim=rev(range(c(RG$age, SM$age))),
ylim=range(c(0, max(log(c(RG$div, SM$div), base = 2)))), # note the base 2
xlab="Absolute time (Mya)",
ylab="Diversity")
# adding the standard method diversity:
lines(x=SM$age,
y = log(SM$div, base=2),
type = "l", lwd=2)
# adding the range-through diversity:
lines(x=RG$age,
y = log(RG$div, base = 2),
type = "l", col="blue", lwd=2)
# Here, red line marks the K-Pg mass extinction:
abline(v=66, col="red")
The density of fossil occurrences also varies in space. We can plot
this unevenness using the function pbdb_map_occur
from the
paleobioDB
R package.
# Plotting number of occurrences in space:
# you can change the size of the cells using the argument "res"
pbdb_map_occur(dinos_fossil, res = 1, cex = 0.3)
## class : SpatRaster
## dimensions : 169, 370, 1 (nrow, ncol, nlyr)
## resolution : 1, 1 (x, y)
## extent : -180, 190, -85.19218, 83.80782 (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84 (CRS84) (OGC:CRS84)
## source(s) : memory
## name : sum
## min value : 1
## max value : 459
Just as a curiosity, we can also plot the spatial distribution of the T-rex using the dataset we created earlier in this tutorial.
## class : SpatRaster
## dimensions : 169, 370, 1 (nrow, ncol, nlyr)
## resolution : 1, 1 (x, y)
## extent : -180, 190, -85.19218, 83.80782 (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84 (CRS84) (OGC:CRS84)
## source(s) : memory
## name : sum
## min value : 1
## max value : 22
In this tutorial, we briefly explored the fossil record of dinosaurs only, but we gave you 3 other datasets: Trilobites, Mammals, and Ammonites.
Explore them using the same set of functions and techniques that we used above, and then answer:
Apart from the incompleteness of the record, you may have realized that another layer of complexity in the fossil record is the lack of temporal resolution most fossil occurrences have. Sometimes the imprecision is larger than 20 million years, which is a large amount of time – but why do we have so much imprecision in those estimates? We will explore this in the last section of this tutorial.
As you may know, we can use the U238 − Pb206 decay series to estimate the absolute ages of igneous rocks (e.g., granite, basalt, volcanic ash, “frozen lava”, etc). Zircon is a silicate mineral that is widely-used for radiometric dating, because it forms crystals (e.g., when magma cools into rock) that selectively include U238 and exclude all the decay products. This happens because U238 is trapped in the zircon crystal after it forms. Then, the crystal begins to accumulate all of the decay products, but in this case, these products are kept trapped within the crystal structure. Physicists have determined that after 4.5 billion years (4,500,000,000 years; 4 × 109 years), roughly 50% of a sample of pure U238 will have decayed into Pb206. Thus, by looking at the ratio of U238 and Pb206 in a sample, we can estimate the ages of an igneous rock and thus gain insights into the age of fossil-bearing sedimentary layers.
To apply this method, we first have to estimate the decay rate of U238. An exponential decay process can be modeled as
where Ut is the amount of U238 at time t, U0 is the starting amount, t is the elapsed time (since the formation of the zircon crystal), and k is the radioactive decay constant for U238. Consequently, we need to estimate k for any dating we may wish to do. We can estimate k because we know the half-life ( = $t_{\frac{1}{2}}$) of U238: this is the value of t where an initial quantity of U0 = 1 will have decayed by half, or Ut = 0.5. Rearranging the equation gives:
$k = \frac{log 2}{t_{\frac{1}{2}}}$
Note we have to use the natural log (i.e. base = e) in the equation above. More on the method of half-lives can be found here.
Now, using the decay constant k obtained above, we can apply equation (1) to date the rock layers given an estimate of the ratio of U238 to Pb206. These estimates are typically made using ionizing mass spectrometry; note also that we need the ratio of the number of atoms of U and Pb, not the ratio of masses. Suppose, for a zircon crystal, that we have estimated the ratios of uranium to lead as WU : WPb. This lets us directly estimate Ut and U0 in the decay equation above. If you think of WU as the amount of uranium currently in the sample, then the total amount that you started with had to have been WU plus WPb, because every Pb206 atom is the result of decay from exactly one U238 atom. Thus, WU = Ut and U0 = WU + WPb.
Know, we can come back to equation (1), solve for t, and get to the equation that allow you to estimate the age of a rock layer, given you measured certain amounts of WU and WPb.
As an extra reading, you can find an interesting outreach article about zircon dating here.
A last important point is that not all fossiliferous sites have the conditions that allow us to do zircon dating, which means that paleontologists have to use other techniques to date those fossils. This involves a lot of work and usually adds uncertainty on the dating process. If you have 14 minutes of your free time, and want to see how important and yet difficult it is to date a fossil, you could watch this short video from the youtube channel PBS Eons.
Jetz, W., Thomas, G. H., Joy, J. B., Hartmann, K., & Mooers, A. O. (2012). The global diversity of birds in space and time. Nature, 491(7424), 444-448.
Upham, N. S., Esselstyn, J. A., & Jetz, W. (2019). Inferring the mammal tree: species-level sets of phylogenies for questions in ecology, evolution, and conservation. PLoS biology, 17(12), e3000494.