Package 'fingerPro'

Title: Sediment Source Fingerprinting
Description: Quantifies the provenance of the sediments in a catchment or study area. Based on a comprehensive characterization of the sediment sources and the end sediment mixtures a mixing model algorithm is applied to the sediment mixtures in order to estimate the relative contribution of each potential source. The package includes several statistical methods such as Kruskal-Wallis test, discriminant function analysis ('DFA'), principal component plot ('PCA') to select the optimal subset of tracer properties. The variability within each sediment source is also considered to estimate the statistical distribution of the sources contribution.
Authors: Ivan Lizaga [aut, cre], Borja Latorre [aut], Leticia Gaspar [aut], Ana Navas [aut], Vince Q Vu [ctb]
Maintainer: Ivan Lizaga <[email protected]>
License: GPL (>= 2)
Version: 1.1
Built: 2024-11-21 06:54:03 UTC
Source: CRAN

Help Index


Sediment Source Fingerprinting

Description

Soil erosion is one of the biggest challenges for food production and reservoirs siltation around the world. Information on sediment, nutrients and pollutant transport is required for effective control strategies. Source estimates are difficult to obtain using traditional monitoring techniques, but sediment source fingerprinting, has been proved to be a valuable tool. Sediment source fingerprinting offers the potential to assess sediment provenance as a basis to develop management plans and prevent erosion. The procedure focuses on developing methods that enable the apportionment of sediment sources to be identified from a composite sample of sediment mixture material. We developed an R-package as a tool to quantify the provenance of the sediments in a catchment. A mixing model algorithm is applied to the sediment mixture samples in order to estimate the relative contribution of each potential source. The package consists of a set of functions used to: i) characterise and pre-process the data, select the optimum subset of tracers; ii) unmix sediment samples and quantify the apportionment of each source; iii) assess the effect of the source variability; and iv) visualize and export the results.

Author(s)

Ivan Lizaga, Borja Latorre, Leticia Gaspar, Ana Navas

Maintainer: Ivan Lizaga <[email protected] // [email protected]>

See Also

https://github.com/eead-csic-eesa

Examples

#Created on 22/08/2018

#If you want to use your own data
#setwd("the directory that contains your dataset")
#data <- read.table('your dataset', header = T, sep = '\t')
#install.packages("fingerPro")
#library(fingerPro)
#Example of the data included in the fingerPro package
#Load the dataset called "catchment" 

# "Catchment": this dataset has been selected from a Mediterranean catchment for 
#this purpose and contains high-quality radionuclides and geochemistry data.
#AG (cropland)
#PI and PI1 (Pine forest, at first looks different but when you display de LDA plot 
#you will see that the wisher decision in join both pines as the same source)
#SS (subsoil)
data <- catchment
#boxPlot(data, columns = 1:6, ncol = 3)
#correlationPlot(data, columns = 1:5, mixtures = TRUE)
LDAPlot(data, P3D=FALSE)
#variables are collinear
#select the optimum set of tracers by implementing the statistical tests 
data <- rangeTest(data)
data <- KWTest(data)
data <- DFATest(data)
#Check how the selected tracers discriminate between sources
LDAPlot(data, P3D=FALSE)
#change P3D=FALSE to P3D=TRUE to visualize the 3D LDAPlot
#2D and 3D LDAPlots suggest that two of the sources have to be combined
#reload the original dataset "catchment"
data <- catchment
# Combine sources PI1 and PI based on the previous LDAPlot
data$Land_Use[data$Land_Use == 'PI1'] <- 'PI'
#select the optimum set of tracers by implementing the statistical tests 
data <- rangeTest(data)
data <- KWTest(data)
data <- DFATest(data)
LDAPlot(data, P3D=FALSE)
PCAPlot(data)
#Now the optimum tracer properties selected discriminate well, so proceed with the unmix function
result <- unmix(data, samples = 100L, iter =100L)
#Display the results
plotResults(result, y_high = 5, n = 1)
writeResults(result)

Box and whiskers plot

Description

The boxplot compactly shows the distribution of a continuous variable. It displays five summary statistics (the median, two hinges and two whiskers), and all "outlying" points individually.

Usage

boxPlot(data, columns = 1:ncol(data) - 2, ncol = 3)

Arguments

data

Data frame containing source and mixtures data

columns

Numeric vector containing the index of the columns in the chart (the first column refers to the first variable)

ncol

Number of charts per row


Land use and fingerprinting properties in a Mediterranean catchment

Description

A dataset containing the different tracer properties of the different land uses in a Mediterranean catchment and one mixture sample located at the output of the catchment. The variables are as follows:

Usage

catchment

Format

A data frame with 22 rows and 23 variables:

id

reference number id of each sample analysed

Land_Use

grouping variable, in this study refers to the different land uses in the catchment

Pbex, K40, Bi214, Ra226, Th232, U238, Nb, Sr, Rb, Pb, Zn, Fe, Mn, Cr, V, Ti, Ca, K, Al, Si, Mg

value of the tracer property for each sample


Correlation matrix chart

Description

The function displays a correlation matrix of each of the properties divided by the different sources to help the user in the decision.

Usage

correlationPlot(data, columns = c(1:ncol(data) - 1), mixtures = F)

Arguments

data

Data frame containing source and mixtures data

columns

Numeric vector containing the index of the columns in the chart (the first column refers to the grouping variable)

mixtures

Boolean to include or exclude the mixture samples in the chart


Discriminant function analysis test

Description

Performs a stepwise forward variable selection using the Wilk's Lambda criterion.

Usage

DFATest(data, niveau = 0.1)

Arguments

data

Data frame containing source and mixtures

niveau

level for the approximate F-test decision

Value

Data frame only containing the variables that pass the DFA test


Biplot for Principal Components using ggplot2

Description

Biplot for Principal Components using ggplot2

Usage

ggbiplot(pcobj, choices = 1:2, scale = 1, pc.biplot = TRUE,
  obs.scale = 1 - scale, var.scale = scale, groups = NULL,
  ellipse = FALSE, ellipse.prob = 0.68, labels = NULL, labels.size = 3,
  alpha = 1, var.axes = TRUE, circle = FALSE, circle.prob = 0.69,
  varname.size = 3, varname.adjust = 1.5, varname.abbrev = FALSE)

Arguments

pcobj

an object returned by prcomp() or princomp()

choices

which PCs to plot

scale

covariance biplot (scale = 1), form biplot (scale = 0). When scale = 1, the inner product between the variables approximates the covariance and the distance between the points approximates the Mahalanobis distance.

pc.biplot

for compatibility with biplot.princomp()

obs.scale

scale factor to apply to observations

var.scale

scale factor to apply to variables

groups

optional factor variable indicating the groups that the observations belong to. If provided the points will be colored according to groups

ellipse

draw a normal data ellipse for each group?

ellipse.prob

size of the ellipse in Normal probability

labels

optional vector of labels for the observations

labels.size

size of the text used for the labels

alpha

alpha transparency value for the points (0 = transparent, 1 = opaque)

var.axes

draw arrows for the variables?

circle

draw a correlation circle? (only applies when prcomp was called with scale = TRUE and when var.scale = 1)

varname.size

size of the text for variable names

varname.adjust

adjustment factor the placement of the variable names, >= 1 means farther from the arrow

varname.abbrev

whether or not to abbreviate the variable names

circle.prob

size of the ellipse in Normal probability

Value

a ggplot2 plot


Input sediment mixtures

Description

The function select and extract the sediment mixtures of the dataset.

Usage

inputSample(data)

Arguments

data

Data frame containing source and mixtures data


Input sediment sources

Description

The function select and extract the source samples of the dataset.

Usage

inputSource(data)

Arguments

data

Data frame containing source and mixtures data


Kruskal-Wallis rank sum test

Description

This function excludes from the original data frame the properties which do not show significant differences between sources.

Usage

KWTest(data, pvalue = 0.05)

Arguments

data

Data frame containing source and mixtures

pvalue

p-value threshold

Value

Data frame only containing the variables that pass the Kruskal-Wallis test


Linear discriminat analysis chart

Description

The function performs a linear discriminant analysis and displays the data in the relevant dimensions.

Usage

LDAPlot(data, P3D = FALSE)

Arguments

data

Data frame containing source and mixtures data

P3D

Boolean to switch between 2 to 3 dimensional chart


Principal component analysis chart

Description

The function performs a principal components analysis on the given data matrix and displays a biplot using vqv.ggbiplot package of the results for each different source to help the user in the decision.

Usage

PCAPlot(data, components = c(1:2))

Arguments

data

Data frame containing source and mixtures data

components

Numeric vector containing the index of the two principal components in the chart


Displays the results in the screen

Description

The function performs a density chart of the relative contribution of the potential sediment sources for each sediment mixture in the dataset.

Usage

plotResults(data, y_high = 6.5, n = 1)

Arguments

data

Data frame containing the relative contribution of the potential sediment sources for each sediment mixture in the dataset

y_high

Number of the vertical height of the y-axis

n

Number of charts per row


Range test

Description

Function that excludes the properties of the sediment mixture/s outside the minimum and maximum values in the sediment sources.

Usage

rangeTest(data)

Arguments

data

Data frame containing source and mixtures

Value

Data frame containing sediment sources and mixtures


Unmix sediment mixtures

Description

Asses the relative contribution of the potential sediment sources for each sediment mixture in the dataset.

Usage

unmix(data, samples = 100L, iter = 100L, seed = 123456L)

Arguments

data

Data frame containing sediment source and mixtures

samples

Number of samples in each hypercube dimension

iter

Iterations in the source variability analysis

seed

Seed for the random number generator

Value

Data frame containing the relative contribution of the sediment sources for each sediment mixture and iterations


Mixing model

Description

Mixing model

Usage

unmix_c(sources, samples, trials = 100L, iter = 100L,
  seed = 69512L)

Arguments

sources

Data frame containing sediment sources data

samples

Data frame containing sediment mixtures data

trials

Number of samples in each hypercube dimension

iter

Iterations in the source variability analysis

seed

Seed for the random number generator

Value

Data frame containing the relative contribution of the sediment sources for each sediment mixture and iterations


Save the results

Description

The function saves the results in the workspace file for all the sediment mixture samples and for each sediment mixture sample separately

Usage

writeResults(data)

Arguments

data

Data frame containing the relative contribution of the potential sediment sources for each sediment mixture in the dataset