1 Smithsonian Tropical Research Institute, Balboa, Ancón, Republic of Panama, 2 Corporación Geológica ARES, Bogotá, Colombia. 3 Servicio Geológico Colombiano, Bogotá, Colombia.
SDAR is a fast and consistent tool for plotting and facilitating the analysis of stratigraphic and sedimentological data, designed to plot detailed stratigraphic sections and to perform quantitative stratigraphic analyses.
Stratigraphic Columns (SC) are the most useful and common ways to represent the field descriptions (e.g., grain size, the thickness of rock packages, fossil content and lithological components) of rock sequences and well logs. In these representations, the width of SC vary according to the grain size (i.e., the wider the strata, the coarser the rocks (Miall 1990; Tucker 2011), and the thickness of each layer is represented at the vertical axis of the diagram. Typically these representations are drawn ‘manually’ using vector graphic editors (e.g., Adobe Illustrator®, CorelDRAW®, Inskape). Nowadays there are various software packages which automatically plots SCs, but there are not versatile open-source tools and it is very difficult to both store and analyse stratigraphic information.
This document presents Stratigraphic Data Analysis in R (SDAR), an analytical package designed for both plotting and facilitate the analysis of Stratigraphic Data in R (R Core Team 2019). SDAR, uses simple stratigraphic data and takes advantage of the flexible plotting tools available in R to produce detailed SCs. The main benefits of SDAR are:
To install SDAR package from CRAN:
The standard workflow in SDAR consists of
saltarin_beds
To explore the functionalities of SDAR, we will use the publicly
available dataset of Saltarin well, saltarin_beds
is the
example dataset available within SDAR, this dataset gives a lithologic
description for borehole Saltarin 1A, located in the Llanos Basin in
eastern Colombia (4.612 N, 70.495 W). The stratigraphic well Saltarin 1A
drilled 671 meters of the Miocene succession of the eastern Llanos
basin, corresponding to the Carbonera (124.1 m; 407.1 ft), Leon (105.1
m; 344.8 ft), and Guayabo Formations (441.8 m; 1449.5 ft)
(Bayona, et al. 2008). The Saltarin core was
described at a scale of 1:50 for identification of grain-size trends,
sedimentary structures, clast composition, the thickness of lamination,
bioturbation patterns, and macrofossil identification, all of which are
used for identifying individual lithofacies and for sedimentological and
stratigraphic analyses
(Jaramillo et al.,
2017).
The command data(saltarin_beds)
will load the dataset
saltarin_beds
into the current R session.
library(SDAR) # Load SDAR library
data(saltarin_beds) # load Saltarin demo dataset
class(saltarin_beds)
#> [1] "data.frame"
# check the content and the structure of Saltarin_beds dataset
nrow(saltarin_beds) # number of rock layers
#> [1] 686
ncol(saltarin_beds) # number of variables recording composition and texture description of each layer
#> [1] 22
names(saltarin_beds) # variable names of composition and texture description of each layer
#> [1] "bed_number" "base" "top"
#> [4] "rock_type" "prim_litho" "grain_size"
#> [7] "prim_litho_percent" "sec_litho" "grain_size_sec_litho"
#> [10] "sec_litho_percent" "base_contact" "grading"
#> [13] "grain_size_base" "grain_size_top" "sorting"
#> [16] "roundness" "matrix" "cement"
#> [19] "fabric" "munsell_color" "Rcolor"
#> [22] "notes"
Note that saltarin_beds
is a data frame object with 686
layers (rows), and 22 variables (columns) storing thickness, composition
and texture description of each layer, stored following the suggested
format by SDAR (to get more details about the specific types of data
required by SDAR, check SDAR_data_model
vignette).
In order to draw a stratigraphic layer in SDAR, the minimum information
required for each layer is bed_number
, thickness (i.e, it
is defined by a base
and a top
),
rock_type
, prim_litho
, and
grain_size
. In summary, a table with the structure
presented in table 1 must be provided.
This example is from a borehole core where depths are measured down from the surface,
therefore “base” is greather than “top”.
bed_number base top rock_type prim_litho grain_size 1 671 670.2 sedimentary claystone clay 2 670.2 669.4 covered 3 669.4 669.18 sedimentary sandstone medium sand 4 669.18 667.6 sedimentary limestone wackestone 5 667.6 667.2 sedimentary conglomerate boulder 6 667.2 666.2 sedimentary shale silt
# header of the mandatory fields of "saltarin_beds" dataset to draw a graphic log using SDAR
head(saltarin_beds[,1:6])
#> bed_number base top rock_type prim_litho grain_size
#> 1 1 671.00 670.20 sedimentary claystone clay
#> 2 2 670.20 669.40 sedimentary siltstone silt
#> 3 3 669.40 669.18 sedimentary siltstone silt
#> 4 4 669.18 667.60 sedimentary claystone clay
#> 5 5 667.60 667.20 sedimentary siltstone silt
#> 6 6 667.20 666.20 sedimentary siltstone silt
NOTE: The SDAR project includes the development of a graphic user interface to connect this R package with a database management system; for this reason the structure of the data and headers (column names) should be followed in order to match the database structure.
To improve communication between geoscientists, some conventions, defined by sedimentologists to draw lithology patterns, and to describe grain size, color and so on, are implemented. Details on the information required to define a layer and the sources for the conventions implemented are provided in the vignette “SDAR data model”.
We have provided on the SDAR repository a template of the data format used by SDAR as a Microsoft Excel spreadsheet, SDAR_v0.95_beds_template.xlsx. This is the suggested format by SDAR to store thickness, composition and texture description of rock layers (beds). The data for each bed should be presented as a row, with columns for each of the parameters entered for that bed (e.g., thickness, lithology, grain size and so on).
The simplest way to get your stratigraphic data into R for use with
SDAR is to fill out the SDAR beds Excel template and import this file
into R. There are several functions to load Excel files into R, below
are the steps to import an Excel file using the readxl
package.
To install readxl package from CRAN:
In order to import an Excel file, navigate to your working directory
(for example, with setwd()
), or add the full path where
your file is stored to the read_excel
function.
library (readxl) # load the readxl package
my_beds <- read_excel("file_name.xlsx") # on your working directory
my_beds <- read_excel("Path where your Excel file is stored/file_name.xlsx") # setting full path
# Notice that the separator between folders is forward slash (/), as it is on Linux and Mac systems.
# When working in Windows, you need to either use the forward slash or using double backslash (\\).
my_beds <- read_excel("C:\\Users\\john\\Desktop\\File_name.xlsx") # full path example in windows systems
The Saltarin well example dataset available within SDAR is also
accesible in Excel format, it is available in installed files folder
inst/extdata
, to find
inst/extdata/SDAR_v0.95_beds_saltarin.xlsx
, you need to
call system.file(“extdata”, “mydata.xlsx”, package =
“mypackage”)
.
# Read the SDAR beds external data example (Excel file format)
library (readxl)
fpath <- system.file("extdata", "SDAR_v0.95_beds_saltarin.xlsx", package = "SDAR")
beds_data <- read_excel(fpath)
nrow(beds_data) # number of rock layers
#> [1] 686
names(beds_data) # variable names of composition and texture description of each layer
#> [1] "bed_number" "base" "top"
#> [4] "rock_type" "prim_litho" "grain_size"
#> [7] "prim_litho_percent" "sec_litho" "grain_size_sec_litho"
#> [10] "sec_litho_percent" "base_contact" "grading"
#> [13] "grain_size_base" "grain_size_top" "sorting"
#> [16] "roundness" "matrix" "cement"
#> [19] "fabric" "munsell_color" "Rcolor"
#> [22] "notes"
strata
classValidating data is all about checking whether a dataset meets all the
requirements it must to fulfill, and the strata
function
makes it easy for you to check if your stratigraphic data satisfy the
defined SDAR data model. The SDAR package introduces a
new S4 object class called strata
to store stratigraphic
data. This S4 class gives a rigorous definition of a strata
object. The valid object of this S4 class will meet all the requirements
specified in the definition (e.g., the names of the columns must be
called: bed_number, base,
top, rock_type,
prim_litho, grain_size, also
base and top must be of a numeric
type). The definition of this S4 class reduces errors. It recognizes the
type of information that the object contains, and the validity of it
(wickham 2014).
The strata
class provide an additional argument called
datum
, this parameter allows users to define the horizontal
reference datum. The options are base or
top; base
is the case when thickness is
measured up from the bottom of, e.g., an outcrop section;
top
is the case when depths are measured down from the
surface, e.g., boreholes and cores. The default options is
datum = "top"
# strata function automatically validates the inputted dataset
# and returns a stratigraphy class object.
validated_beds <- strata(saltarin_beds)
#> 'beds data has been validated successfully'
# check the class of the object generated by the strata function
class(validated_beds)
#> [1] "strata"
#> attr(,"package")
#> [1] "SDAR"
The previous chunk of code validated the inputted dataset
saltarin_beds
and returns a new strata
class
object validated_beds. The fact that there are no
warnings or errors
beds data has been validated successfully
means that indeed
each row (bed/layer) information in the input data, successfully satisfy
the expectations in SDAR data model (an error would
occur for example, if we’d misspell sandstone). By default, all errors
and warnings are printed out on the R console screen when validation
rules are confronted with input data. The following example contains an
error specification Error: Check row numbers 3, 7. values
(sandtone, mudston) are ‘prim_litho’ not register in
‘litho.table’. (note that sandstone and mudstone are
misspelled, therefore the error is caught and shown in the R console).
In beds/layers stratigraphic overlapping is not allowed, if overlapping
occurs strata
function will print an error on screen and
return a dataframe object with the overlapping intervals.
In order to validate data from an outcrop / stratigraphic section,
set the parameter datum = "top"
strata
classIn this version of SDAR package, the methods
associated with the strata
class are plot
and
summary
. Once the stratigraphy data is loaded into R, and
sucessfully validated on the strata
class, we are able to
plot strata
class objects to visualise the information. The
plot method provides different outputs depending on the parameter
settings. The summary
method displays standard information
about the strata
class object. The summary
function displays a synopsis of the content in the strata
object including the total number of layers, the thickness of the study
section and the number of layers by lithology type, and grain size.
strata
classThe minimal information required to plot a stratigraphic column using
SDAR is a table with the structure presented in
table 1. Having a defined and a validated dataset, as a
strata
class, the plot method plot.strata
is
accessed automatically.
# Code to generate example presented in Figure 1.
library(SDAR) # load SDAR library
data(saltarin_beds) # load Saltarin beds dataset
validated_beds <- strata(saltarin_beds) # validates the Saltarin_beds dataset
plot(validated_beds) # plot a stratigraphic log with the SDAR default options
# The default parameters are: `datum = "top"`, `data.units = "feet"`,
# `scale = 100`, and `barscale = 2`
strata
class. The Saltarin datased was
previously
This plotting parameter (scale
) enables users to employ
different drawing scales (graphic vertical scaling). It defines the
vertical scale to draw the graphic log, from 1:1 to any desired scale
(e.g., 1:50, 1:200, 1:500). Moreover, the data.units
parameter allows users to specifies the unit of measure of the
stratigraphic thickness used in input data (thickness measured in
field), the user defines whether the data were measured in meters or
feet, default unit ’feet’.
# Code to generate example presented in Figure 2.
plot(validated_beds, data.units="meters", scale=300, barscale=5)
# plot Saltarin dataset at 1:300 scale in meters (meters was the measure unit in the description
# process of Saltarin well), and thickness marks and labels each 5 meters, by default the bar scale is
# plotted at the left side of the lithology track.
Given that the stratigraphic information is stored in a numerical format, SDAR provides the option to draw a specific interval for a given outcrop section or borehole log. The parameters included in plot function that allows this functionality are: * subset.base This argument defines the lower limit of the stratigraphic interval of interest. * subset.top Defines the upper limit of the stratigraphic interval of interest.
# Code to generate the example presented in Figure 3.
plot(validated_beds, data.units="meters", subset.base=614, subset.top=597)
subset.base
and
subset.top
parameters [614 - 597 meters] are plotted.
Often the grain size is not a constant parameter throughout a rock layer, for that reason, in a detailed field description geologists include the grain size variation. Usually, the grain size is described at the bottom and at the top of the layer. Grading commonly consists of an upward decrease in grain size (normal grading), however, certain sedimentary process result in an upward increase in grain size (inverse grading). When grading is normal or inverse, the grain size of the base and top must be provided in the format presented in Table 2.
In order to include and represent gradding information in SDAR, the columns grading,
grain_size_base, and grain_size_top must be included in beds/layers table.
bed_number base top rock_type prim_litho grain_size grading grain_size_base grain_size_top 1 671 670.2 sedimentary claystone clay 2 670.2 669.4 covered 3 669.4 669.18 sedimentary sandstone medium sand normal coarse sand fine / medium sand 4 669.18 667.6 sedimentary limestone wackestone normal packstone wackestone 5 667.6 667.2 sedimentary conglomerate boulder inverse cobble boulder 6 667.2 666.2 sedimentary shale silt
In the previous sections it was presented how SDAR represents the information associated with beds. Here, how SDAR integrate intervals attributes (e.g., bioturbation, sedimentary structures) is presented.
An interval is defined over a stratigraphic range; it has to be defined by a base and a top, the main requirement to set an interval is that the recorded geological feature (e.g., sedimentary structures, bioturbation, unit name, fossil content) is presented throughout the defined stratigraphic range.
In the data structure to define intervals, the user must define a stratigraphical base, top, and the recorded feature of each interval as is presented in Table 3. Each row in this data array describes a stratigraphic interval with the feature described on it (to get more details about the specific types of data required by SDAR, check SDAR_data_model vignette). The interval features available to integrate in this SDAR version are: * core number * samples * visual oil stain * bioturbation * sedimentary structures * fossils * other symbols * lithostratigraphy * chronostratigraphy
base | top | index |
---|---|---|
669.4 | 669.2 | intense |
668.6 | 668.2 | moderate |
665.2 | 665.0 | moderate |
661.4 | 659.9 | low |
637.5 | 637.0 | low |
base | top | sed_structure |
---|---|---|
671 | 670.2 | cross bedding |
671.5 | 671.5 | climbing ripples |
669.4 | 669.18 | lenticular lamination |
668.2 | 667.6 | normal grading |
667.2 | 666.2 | wavy lamination |
We have provided on the SDAR repository a template of the data format used by SDAR as a Microsoft Excel spreadsheet, SDAR_v0.95_intervals_template.xlsx. This is the suggested format by SDAR to store interval information (e.g., bioturbation, sedimentary structures, and so on).
In order to import a sheet from an Excel file, navigate to your
working directory (for example, with setwd()
), or add the
full path where your file is stored to the read_excel
function, and specify the sheet to read with a number or name (the name
of a sheet) or (the position of the sheet).
# Specify sheet by its name
my_int_data <- read_excel("file_name.xlsx", sheet= "data") # on your working directory
my_int_data <- read_excel("Path where your Excel file is stored/file_name.xlsx", sheet= "data") # full path
# Specify sheet by its index
my_int_data <- read_excel("file_name.xlsx", sheet= 1)
# Notice that the separator between folders is forward slash (/), as it is on Linux and Mac systems.
# When working in Windows, you need to either use the forward slash or using double backslash (\\).
# full path example in windows systems
my_int_data <- read_excel("C:\\Users\\john\\Desktop\\File_name.xlsx", sheet= "data")
The Saltarin intervals dataset is available in Excel
format, it is available in installed files folder
inst/extdata
, to find
inst/extdata/SDAR_v0.95_intervals_saltarin.xlsx
, you need
to call
system.file(“extdata”, “mydata.xlsx”, package =
“mypackage”)
.
# Read the bioturbation external data example (Saltarin intervals Excel file format)
fpath <- system.file("extdata", "SDAR_v0.95_intervals_saltarin.xlsx", package = "SDAR")
bioturbation_data <- read_excel(fpath, sheet = "bioturbation") # import bioturbation sheet
nrow(bioturbation_data) # number of bioturbated intervals
#> [1] 151
bioturbation_data # header of Saltarin bioturbation dataset
#> # A tibble: 151 × 3
#> base top index
#> <dbl> <dbl> <chr>
#> 1 669.4 669.2 intense
#> 2 668.6 668.2 moderate
#> 3 665.15 664.95 moderate
#> 4 661.4 659.9 low
#> # ℹ 147 more rows
Import Saltarin intervals dataset
# import core_number data
core_number_data <- read_excel(fpath, sheet = "core_number")
# import samples data
samples_data <- read_excel(fpath, sheet = "samples")
# import sedimentary structures data
sed_structures_data <- read_excel(fpath, sheet = "sed_structures")
# import fossils data
fossils_data <- read_excel(fpath, sheet = "fossils")
# import other symbols data
other_symbols_data <- read_excel(fpath, sheet = "other_symbols")
# import lithostratigraphy data
litho_data <- read_excel(fpath, sheet = "lithostra")
# import chronostratigraphy data
crono_data <- read_excel(fpath, sheet = "chronostra")
Plot setting parameters allows users to integrate features to the graphic log (e.g. sedimentary structure, fossil content, unit name). These elements will be plotted on the right or left side of the lithological column. Each one of these additional features will be displayed as symbols, graphic bar, or points at the right or left side of the lithological column. Figure 4 presents the way that SDAR represents the interval attributes.
# Code to generate example presented in Figure 4.
plot(validated_beds, data.units="meters",
subset.base=664, subset.top=649,
bioturbation=bioturbation_data,
fossils=fossils_data,
sed.structures=sed_structures_data,
other.sym=other_symbols_data,
samples=samples_data,
ncore=core_number_data,
lithostrat=litho_data,
chronostrat=crono_data,
symbols.size=0.8)
# For the performance of this example only a subset of the data is plotted. In order to plot
# the complete Saltarin Well dataset, suppress subset.base=664, and subset.top=649" parameters
Figures 1-4 present examples of graphic logs generated automatically using SDAR packages after the stratigraphic information has been correctly loaded and validated into R. Graphic log generated by SDAR is exported as PDF files (completely editable with any vector drawing application). It will present on a single page, and the paper size will automatically be updated by changes in the vertical scale, or when different sets of attributes are plotted on the right or left side of the lithological column (check the working directory for the PDF output file).
If you see problems with the PDF output, remember that the problem is much more likely to be in your viewer than in R. Try another viewer if possible, browsers as Mozilla Firefox and Google Chrome provide an excellent rendering engine for PDF files.
strata
class dataIn this section, the functionality of the summary
method
is presented. When summary
function is executed with a
strata
class object, the results are printed in the R
console. The summary
function displays a synopsis of the
content in the strata
object. It includes the total number
of layers, the thickness of the SC, the thickness of covered intervals,
thickness percent and the number of layers by lithology type, into the
study SC. The results of running summary
function with the
example dataset are printed below.
summary(validated_beds)
#>
#> Number of beds: 610
#> Number of covered intervals 76
#>
#> Thickness of the section: 671.0
#> Thickness of covered intervals: 77.9
#>
#> Summary by lithology:
#>
#> Thickness Percent (%) Number beds
#> sandstone 233.3 34.77 330
#> claystone 211.6 31.53 130
#> siltstone 143.4 21.37 138
#> coal 3.1 0.46 8
#> conglomerate 1.8 0.27 4
#> covered 77.9 11.61 76
summary(validated_beds, grain.size=TRUE)
#>
#> Number of beds: 610
#> Number of covered intervals 76
#>
#> Thickness of the section: 671.0
#> Thickness of covered intervals: 77.9
#>
#> Summary by lithology:
#>
#> Thickness Percent (%) Number beds
#> sandstone 233.3 34.77 330
#> claystone 211.6 31.53 130
#> siltstone 143.4 21.37 138
#> coal 3.1 0.46 8
#> conglomerate 1.8 0.27 4
#> covered 77.9 11.61 76
#>
#> Summary by Grain Size:
#>
#> Thickness Percent (%) Number beds
#> clay 194.0 28.92 123
#> clay / silt 43.7 6.51 28
#> silt 88.6 13.21 89
#> silt / very fine sand 88.3 13.16 101
#> very fine sand 71.6 10.68 122
#> very fine / fine sand 32.4 4.83 49
#> fine sand 27.5 4.10 37
#> fine / medium sand 20.3 3.03 18
#> medium sand 9.2 1.37 11
#> medium / coarse sand 5.6 0.83 8
#> coarse sand 5.5 0.82 15
#> coarse / very coarse sand 3.7 0.55 3
#> very coarse / granule 1.5 0.22 3
#> granule 1.1 0.16 3
#> covered 77.9 11.61 76
This project has been sponsored by Carlos Jaramillo (Smithsonian Tropical Research Institute), financial support of this research was provided by COLCIENCIAS (partly funding the master studies of the main author) fundación para la Investigación de la Ciencia y la Tecnológia del Banco de la República, (Colombia), Corporación Geológica ARES (Colombia), and the Smithsonian Tropical Research Institute, the Anders Foundation, 1923 Fund and Gregory D. and Jennifer Walston Johnson.
The Saltarin 1A well dataset for this analysis, was provided by Alejandro Mora of HOCOL S.A.
Bayona, G., Valencia, A., Mora, A., Rueda, M., Ortiz, J., Montenegro, O. 2008. Estratigrafia y procedencia de las rocas del Mioceno en la parte distal de la cuenca antepais de los Llanos de Colombia. Geologia Colombiana, 33, 23-46.
Jaramillo, C., Romero, I., D’Apolito, C., Bayona, G., Duarte, E., Louwye, S., Escobar, J., Luque, J., Carrillo-Briceno, J., Zapata, V., Mora, A., Schouten, S., Zavada, M., Harrington, G., Ortiz, J., and Wesselingh, F., 2017, Miocene Flooding Events of Western Amazonia: Science Advances, v. 3, p. e1601693
Miall, A. D. (1990). Principles of Sedimentary Basin Analysis. Springer-Verlag.
R Core Team (2014). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
Tucker, M. E. (2011). Sedimentary Rocks in the Field: A Practical Guide. Geological Field Guide. Wiley.
Wickham, H. (2014). Advanced R (Chapman & Hall/CRC The R Series). hapman and Hall/CRC, 1 edition.