--- title: "Introduction to SDAR" author: "John Ortiz" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Introduction to SDAR} %\VignetteEngine{knitr::rmarkdown}f \usepackage[utf8]{inputenc} --- ```{r, echo = FALSE, message = FALSE} knitr::opts_chunk$set(collapse = T, comment = "#>") options(tibble.print_min = 4L, tibble.print_max = 4L, pillar.sigfig = 5) library(SDAR) ``` ## Introduction to Stratigraphic Data Analysis (SDAR)
saltarin_beds
To explore the functionalities of SDAR, we will use the publicly available dataset of Saltarin well,
saltarin_beds
is the example dataset available within SDAR,
this dataset gives a lithologic description for borehole Saltarin 1A, located in the Llanos Basin in
eastern Colombia (4.612 N, 70.495 W). The stratigraphic well Saltarin 1A drilled 671 meters of the
Miocene succession of the eastern Llanos basin, corresponding to the Carbonera (124.1 m; 407.1
ft), Leon (105.1 m; 344.8 ft), and Guayabo Formations (441.8 m; 1449.5 ft) (Bayona, et al. 2008). The Saltarin core was described at a scale of 1:50 for
identification of grain-size trends, sedimentary structures, clast composition, the thickness of
lamination, bioturbation patterns, and macrofossil identification, all of which are used for
identifying individual lithofacies and for sedimentological and stratigraphic analyses
(Jaramillo et al., 2017).
The command data(saltarin_beds)
will load the dataset saltarin_beds
into
the current R session.
```{r}
library(SDAR) # Load SDAR library
data(saltarin_beds) # load Saltarin demo dataset
class(saltarin_beds)
# check the content and the structure of Saltarin_beds dataset
nrow(saltarin_beds) # number of rock layers
ncol(saltarin_beds) # number of variables recording composition and texture description of each layer
names(saltarin_beds) # variable names of composition and texture description of each layer
```
Note that saltarin_beds
is a data frame object with 686 layers (rows), and 22 variables
(columns) storing thickness, composition and texture description of each layer, stored following the
suggested format by SDAR (to get more details about the specific types of data required by SDAR,
check SDAR_data_model
vignette).
In order to draw a stratigraphic layer in SDAR,
the minimum information required for each layer is bed_number
, thickness (i.e, it is
defined by a base
and a top
), rock_type
, prim_litho
,
and grain_size
. In summary, a table with the structure presented in table 1 must be
provided.
This example is from a borehole core where depths are measured down from the surface,
therefore "base" is greather than "top".
bed_number | base | top | rock_type | prim_litho | grain_size |
---|---|---|---|---|---|
1 | 671 | 670.2 | sedimentary | claystone | clay |
2 | 670.2 | 669.4 | covered | ||
3 | 669.4 | 669.18 | sedimentary | sandstone | medium sand |
4 | 669.18 | 667.6 | sedimentary | limestone | wackestone |
5 | 667.6 | 667.2 | sedimentary | conglomerate | boulder |
6 | 667.2 | 666.2 | sedimentary | shale | silt |
NOTE: The SDAR project includes the development of a graphic user interface to connect this
R package with a database management system; for this reason the structure of the data and headers
(column names) should be followed in order to match the database structure.
To improve communication between geoscientists,
some conventions, defined by sedimentologists to draw lithology patterns, and to describe grain size, color
and so on, are implemented. Details on the information required to define a layer and the sources for the
conventions implemented are provided in the vignette "SDAR data model".
readxl
package.
To install **readxl** package from CRAN:
```{r, eval = FALSE}
install.packages("readxl")
```
In order to import an Excel file, navigate to your working directory (for example, with setwd()
),
or add the full path where your file is stored to the read_excel
function.
```{r, eval = FALSE}
library (readxl) # load the readxl package
my_beds <- read_excel("file_name.xlsx") # on your working directory
my_beds <- read_excel("Path where your Excel file is stored/file_name.xlsx") # setting full path
# Notice that the separator between folders is forward slash (/), as it is on Linux and Mac systems.
# When working in Windows, you need to either use the forward slash or using double backslash (\\).
my_beds <- read_excel("C:\\Users\\john\\Desktop\\File_name.xlsx") # full path example in windows systems
```
#### Additional external data examples
The Saltarin well example dataset available within SDAR is also accesible in Excel format, it is available
in installed files folder inst/extdata
, to find inst/extdata/SDAR_v0.95_beds_saltarin.xlsx
,
you need to call system.file("extdata", "mydata.xlsx", package = "mypackage")
.
```{r}
# Read the SDAR beds external data example (Excel file format)
library (readxl)
fpath <- system.file("extdata", "SDAR_v0.95_beds_saltarin.xlsx", package = "SDAR")
beds_data <- read_excel(fpath)
nrow(beds_data) # number of rock layers
names(beds_data) # variable names of composition and texture description of each layer
```
### Data validation - the `strata` class
Validating data is all about checking whether a dataset meets all the requirements it must to fulfill,
and the strata
function makes it easy for you to check if your stratigraphic data satisfy
the defined SDAR data model. The **SDAR** package introduces a new S4 object class called `strata`
to store stratigraphic data. This S4 class gives a rigorous definition of a `strata` object.
The valid object of this S4 class will meet all the requirements specified in the definition
(e.g., the names of the columns must be called: **bed_number**, **base**, **top**, **rock_type**,
**prim_litho**, **grain_size**, also **base** and **top** must be of a numeric type).
The definition of this S4 class reduces errors. It recognizes the type of information that the
object contains, and the validity of it (wickham 2014).
The `strata` class provide an additional argument called `datum`, this parameter allows users to define
the horizontal reference datum. The options are **base** or **top**; `base` is the case when thickness is
measured up from the bottom of, e.g., an outcrop section; `top` is the case when depths are measured
down from the surface, e.g., boreholes and cores. The default options is `datum = "top"`
```{r}
# strata function automatically validates the inputted dataset
# and returns a stratigraphy class object.
validated_beds <- strata(saltarin_beds)
# check the class of the object generated by the strata function
class(validated_beds)
```
The previous chunk of code validated the inputted dataset `saltarin_beds` and returns a new `strata`
class object **validated_beds**. The fact that there are no warnings or errors
`beds data has been validated successfully` means that indeed each row (bed/layer) information in
the input data, successfully satisfy the expectations in **SDAR data model** (an error would occur
for example, if we’d misspell sandstone). By default, all errors and warnings are printed out on
the R console screen when validation rules are confronted with input data. The following example
contains an error specification **Error: Check row numbers 3, 7. values (sandtone, mudston)
are 'prim_litho' not register in 'litho.table'.** (note that sandstone and mudstone are misspelled,
therefore the error is caught and shown in the R console). In beds/layers stratigraphic overlapping
is not allowed, if overlapping occurs `strata` function will print an error on screen and return
a dataframe object with the overlapping intervals.
In order to validate data from an outcrop / stratigraphic section, set the parameter `datum = "top"`
```{r, eval = FALSE}
# datum = "base" must be selected when stratigraphic distance above datum
# increases upwards (toward younger levels, as a stratigraphic section).
outcrop_validated_beds <- strata(my_outcrop_beds, datum = "base")
```
### Methods within the `strata` class
In this version of **SDAR** package, the methods associated with the `strata` class are `plot` and `summary`.
Once the stratigraphy data is loaded into R, and sucessfully validated on the `strata` class, we are able to
plot `strata` class objects to visualise the information. The plot method provides different
outputs depending on the parameter settings. The `summary` method displays standard information about
the `strata` class object. The `summary` function displays a synopsis of the content in the `strata`
object including the total number of layers, the thickness of the study section and the number of layers
by lithology type, and grain size.
#### Plot method for `strata` class
The minimal information required to plot a stratigraphic column using **SDAR** is a table with the
structure presented in **table 1**. Having a defined and a validated dataset, as a `strata` class,
the plot method `plot.strata` is accessed automatically.
```{r, eval = FALSE}
# Code to generate example presented in Figure 1.
library(SDAR) # load SDAR library
data(saltarin_beds) # load Saltarin beds dataset
validated_beds <- strata(saltarin_beds) # validates the Saltarin_beds dataset
plot(validated_beds) # plot a stratigraphic log with the SDAR default options
# The default parameters are: `datum = "top"`, `data.units = "feet"`,
# `scale = 100`, and `barscale = 2`
```
In order to include and represent gradding information in SDAR, the columns grading,
grain_size_base, and grain_size_top must be included in beds/layers table.
bed_number | base | top | rock_type | prim_litho | grain_size | grading | grain_size_base | grain_size_top |
---|---|---|---|---|---|---|---|---|
1 | 671 | 670.2 | sedimentary | claystone | clay | |||
2 | 670.2 | 669.4 | covered | |||||
3 | 669.4 | 669.18 | sedimentary | sandstone | medium sand | normal | coarse sand | fine / medium sand |
4 | 669.18 | 667.6 | sedimentary | limestone | wackestone | normal | packstone | wackestone |
5 | 667.6 | 667.2 | sedimentary | conglomerate | boulder | inverse | cobble | boulder |
6 | 667.2 | 666.2 | sedimentary | shale | silt |
base | top | index |
---|---|---|
669.4 | 669.2 | intense |
668.6 | 668.2 | moderate |
665.2 | 665.0 | moderate |
661.4 | 659.9 | low |
637.5 | 637.0 | low |
base | top | sed_structure |
---|---|---|
671 | 670.2 | cross bedding |
671.5 | 671.5 | climbing ripples |
669.4 | 669.18 | lenticular lamination |
668.2 | 667.6 | normal grading |
667.2 | 666.2 | wavy lamination |
setwd()
),
or add the full path where your file is stored to the read_excel
function, and specify the sheet to read
with a number or name (the name of a sheet) or (the position of the sheet).
```{r, eval = FALSE}
# Specify sheet by its name
my_int_data <- read_excel("file_name.xlsx", sheet= "data") # on your working directory
my_int_data <- read_excel("Path where your Excel file is stored/file_name.xlsx", sheet= "data") # full path
# Specify sheet by its index
my_int_data <- read_excel("file_name.xlsx", sheet= 1)
# Notice that the separator between folders is forward slash (/), as it is on Linux and Mac systems.
# When working in Windows, you need to either use the forward slash or using double backslash (\\).
# full path example in windows systems
my_int_data <- read_excel("C:\\Users\\john\\Desktop\\File_name.xlsx", sheet= "data")
```
The **Saltarin intervals dataset** is available in Excel format, it is available
in installed files folder inst/extdata
, to find inst/extdata/SDAR_v0.95_intervals_saltarin.xlsx
,
you need to call system.file("extdata", "mydata.xlsx", package = "mypackage")
.
```{r}
# Read the bioturbation external data example (Saltarin intervals Excel file format)
fpath <- system.file("extdata", "SDAR_v0.95_intervals_saltarin.xlsx", package = "SDAR")
bioturbation_data <- read_excel(fpath, sheet = "bioturbation") # import bioturbation sheet
nrow(bioturbation_data) # number of bioturbated intervals
bioturbation_data # header of Saltarin bioturbation dataset
```
Import Saltarin intervals dataset
```{r}
# import core_number data
core_number_data <- read_excel(fpath, sheet = "core_number")
# import samples data
samples_data <- read_excel(fpath, sheet = "samples")
# import sedimentary structures data
sed_structures_data <- read_excel(fpath, sheet = "sed_structures")
# import fossils data
fossils_data <- read_excel(fpath, sheet = "fossils")
# import other symbols data
other_symbols_data <- read_excel(fpath, sheet = "other_symbols")
# import lithostratigraphy data
litho_data <- read_excel(fpath, sheet = "lithostra")
# import chronostratigraphy data
crono_data <- read_excel(fpath, sheet = "chronostra")
```
#### Display interval features
Plot setting parameters allows users to integrate features to the graphic log (e.g. sedimentary
structure, fossil content, unit name). These elements will be plotted on the right or left side of
the lithological column. Each one of these additional features will be displayed as symbols,
graphic bar, or points at the right or left side of the lithological column. **Figure 4** presents the way that
SDAR represents the interval attributes.
```{r, eval = FALSE}
# Code to generate example presented in Figure 4.
plot(validated_beds, data.units="meters",
subset.base=664, subset.top=649,
bioturbation=bioturbation_data,
fossils=fossils_data,
sed.structures=sed_structures_data,
other.sym=other_symbols_data,
samples=samples_data,
ncore=core_number_data,
lithostrat=litho_data,
chronostrat=crono_data,
symbols.size=0.8)
# For the performance of this example only a subset of the data is plotted. In order to plot
# the complete Saltarin Well dataset, suppress subset.base=664, and subset.top=649" parameters
```