Title: | Model-Based Clustering for Functional Data with Covariates |
---|---|
Description: | Routines for model-based functional cluster analysis for functional data with optional covariates. The idea is to cluster functional subjects (often called functional objects) into homogenous groups by using spline smoothers (for functional data) together with scalar covariates. The spline coefficients and the covariates are modelled as a multivariate Gaussian mixture model, where the number of mixtures corresponds to the number of clusters. The parameters of the model are estimated by maximizing the observed mixture likelihood via an EM algorithm (Arnqvist and Sjöstedt de Luna, 2019) <arXiv:1904.10265>. The clustering method is used to analyze annual lake sediment from lake Kassjön (Northern Sweden) which cover more than 6400 years and can be seen as historical records of weather and climate. |
Authors: | Natalya Pya Arnqvist[aut, cre], Per Arnqvist [aut, cre], Sara Sjöstedt de Luna [aut] |
Maintainer: | Natalya Pya Arnqvist <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.1-1 |
Built: | 2024-11-28 06:46:06 UTC |
Source: | CRAN |
fdaMocca
provides functions for model-based functional cluster analysis for functional data with optional covariates. The aim is to cluster a set of independent functional subjects (often called functional objects) into homogenous groups by using basis function representation of the functional data and allowing scalar covariates. A functional subject is defined as a curve and covariates. The spline coefficients and the (potential) covariates are modelled as a multivariate Gaussian mixture model, where the number of mixtures corresponds to the number of (predefined) clusters. The model allows for different cluster covariance structures for the basis coefficients and for the covariates. The parameters of the model are estimated by maximizing the observed mixture likelihood using an EM-type algorithm (Arnqvist and Sjöstedt de Luna, 2019).
Package: | fdaMocca |
Type: | Package |
License: | GPL (>= 2) |
Per Arnqvist, Sara Sjöstedt de Luna, Natalya Pya Arnqvist
Maintainer: Natalya Pya Arnqvist<[email protected]>
Arnqvist, P., Bigler, C., Renberg, I., Sjöstedt de Luna, S. (2016). Functional clustering of varved lake sediment to reconstruct past seasonal climate. Environmental and Ecological Statistics, 23(4), 513–529.
Abramowicz, K., Arnqvist, P., Secchi, P., Sjöstedt de Luna, S., Vantini, S., Vitelli, V. (2017). Clustering misaligned dependent curves applied to varved lake sediment for climate reconstruction. Stochastic Environmental Research and Risk Assessment. Volume 31.1, 71–85.
Arnqvist, P., and Sjöstedt de Luna, S. (2019). Model based functional clustering of varved lake sediments. arXiv preprint arXiv:1904.10265.
Function to extract the information criteria AIC and BIC, as well as the average Shannon entropy over all functional objects, for a fitted functional clustering mocca
. The Shannon entropy is computed over the posterior probability distribution of belonging to a specific cluster given the functional object (see Arnqvist and Sjöstedt de Luna, 2019, for further details).
criteria.mocca(x)
criteria.mocca(x)
x |
fitted model objects of class |
A table with the AIC, BIC and Shannon entropy values of the fitted model.
Per Arnqvist
Arnqvist, P., and Sjöstedt de Luna, S. (2019). Model based functional clustering of varved lake sediments. arXiv preprint arXiv:1904.10265.
## see examples in mocca()
## see examples in mocca()
Function to estimate model parameters by maximizing the observed log likelihood via an EM algorithm. The estimation procedure is based on an algorithm proposed by James and Sugar (2003).
The function is not normally called directly, but rather service routines for mocca
. See the description of the mocca
function for more detailed information of arguments.
estimate.mocca(data,K=5,q=6,h=2,random=TRUE,B=NULL,svd=TRUE, use.covariates=FALSE,stand.cov=TRUE,index.cov=NULL, lambda=1.4e-4,EM.maxit=50, EMstep.tol=1e-8,Mstep.maxit=10, Mstep.tol=1e-4, EMplot=TRUE,trace=TRUE,n.cores=NULL)
estimate.mocca(data,K=5,q=6,h=2,random=TRUE,B=NULL,svd=TRUE, use.covariates=FALSE,stand.cov=TRUE,index.cov=NULL, lambda=1.4e-4,EM.maxit=50, EMstep.tol=1e-8,Mstep.maxit=10, Mstep.tol=1e-4, EMplot=TRUE,trace=TRUE,n.cores=NULL)
data |
a list containing at least five objects (vectors) named as |
K |
number of clusters (default: |
q |
number of B-splines used to describe the individual curves. Evenly spaced knots are used (default: |
h |
a positive integer, parameter vector dimension in low-dimensionality representation of the curves (spline coefficients).
|
random |
|
B |
an |
svd |
|
use.covariates |
|
stand.cov |
|
index.cov |
a vector of indices indicating which covariates should be used when modelling. If |
lambda |
a positive real number, smoothing parameter value to be used when estimating B-spline coefficients. |
EM.maxit |
a positive integer which gives the maximum number of iterations for a EM algorithm (default: EM.maxit=50). |
EMstep.tol |
the tolerance to use within iterative procedure of the EM algorithm (default: EMstep.tol=1e-8). |
Mstep.maxit |
a positive scalar which gives the maximum number of iterations for an inner loop of the parameter estimation in M step (default: Mstep.maxit=20). |
Mstep.tol |
the tolerance to use within iterative procedure to estimate model parameters (default: Mstep.tol=1e-4). |
EMplot |
|
trace |
|
n.cores |
number of cores to be used with parallel computing. |
A list is returned with the following items:
loglik |
the maximized log likelihood value. |
sig2 |
estimated residual variance for the spline coefficients (for the model without covariates), or a vector of the estimated residual variances for the spline coefficients and for the covariates (for the model with covariates). |
conv |
indicates why the EM algorithm terminated: 0: indicates successful completion. 1: indicates that the iteration limit |
iter |
number of iterations of the EM algorithm taken to get convergence. |
score.hist |
a matrix of the succesive values of the scores: residual variances and log likelihood, up until convergence. |
parameters |
a list containing all the estimated parameters: |
vars |
a list containing results from the E step of the algorithm: the posterior probabilities for each subject |
data |
a list containing all the original data plus re-arranged functional data and covariates (if supplied) needed for EM-steps. |
design |
a list of spline basis matrices with and without covariates:
|
initials |
a list of initial settings: |
Per Arnqvist, Natalya Pya Arnqvist, Sara Sjöstedt de Luna
James, G.M., Sugar, C.A. (2003). Clustering for sparsely sampled functional data. Journal of the American Statistical Association, 98.462, 397–408.
Arnqvist, P., and Sjöstedt de Luna, S. (2019). Model based functional clustering of varved lake sediments. arXiv preprint arXiv:1904.10265.
Function to extract the log-likelihood for a fitted functional clustering mocca
model (fitted by mixture likelihood maximization).
Note: estimate.mocca
uses loglik.EMmocca()
for calculating the log likelihood at each iterative step.
## S3 method for class 'mocca' logLik(object,...)
## S3 method for class 'mocca' logLik(object,...)
object |
fitted model objects of class |
... |
unused in this case |
The log-likehood value as logLik
object.
Per Arnqvist
Arnqvist, P., and Sjöstedt de Luna, S. (2019). Model based functional clustering of varved lake sediments. arXiv preprint arXiv:1904.10265.
This function fits a functional clustering model to observed independent functional subjects, where a functional subject consists of a function and possibly a set of covariates. Here, each curve is projected onto a finite dimensional basis and clustering is done on the resulting basis coefficients. However, rather than treating basis coefficients as parameters, mixed effect modelling is used for the coefficients. In the model-based functional clustering approach the functional subjects (i.e. the spline/basis coefficients and the potential covariates) are assumed to follow a multivariate Gaussian mixture model, where the number of distributions in the mixture model corresponds to the number of (predefined) clusters, . Given that a functional subject belongs to a cluster
, the basis coefficients and covariate values are normally distributed with a cluster-specific mean and covariance structure.
An EM-style algorithm based on James and Sugar (2003) is implemented to fit the Gaussian mixture model for a prespecified number of clusters K
. The model allows for different cluster covariance structure for the spline coefficients and model coefficients for the covariates. See Arnqvist and Sjöstedt de Luna (2019) for details about differences to the clustering model and its implementation.
The routine calls estimate.mocca
for the model fitting.
mocca(data=stop("No data supplied"), K = 5, q = 6, h = 2, use.covariates=FALSE,stand.cov=TRUE,index.cov=NULL, random=TRUE, B=NULL,svd=TRUE,lambda=1.4e-4, EM.maxit=50, EMstep.tol=1e-6,Mstep.maxit=20,Mstep.tol=1e-4,EMplot=TRUE, trace=FALSE,n.cores=NULL)
mocca(data=stop("No data supplied"), K = 5, q = 6, h = 2, use.covariates=FALSE,stand.cov=TRUE,index.cov=NULL, random=TRUE, B=NULL,svd=TRUE,lambda=1.4e-4, EM.maxit=50, EMstep.tol=1e-6,Mstep.maxit=20,Mstep.tol=1e-4,EMplot=TRUE, trace=FALSE,n.cores=NULL)
data |
a list containing at least three objects (vectors) named as i) suppose we observe ii) iii) iv) v) vi) if supplied, |
K |
number of clusters (default: |
q |
number of B-splines for the individual curves. Evenly spaced knots are used (default: |
h |
a positive integer, parameter vector dimension in the low-dimensionality representation of the curves (spline coefficients).
|
use.covariates |
|
stand.cov |
|
index.cov |
a vector of indices indicating which covariates should be used when modelling. If |
random |
|
B |
an |
svd |
|
lambda |
a positive real number, smoothing parameter value to be used when estimating B-spline coefficients. |
EM.maxit |
a positive integer which gives the maximum number of iterations for a EM algorithm (default:
|
EMstep.tol |
the tolerance to use within iterative procedure of the EM algorithm (default: |
Mstep.maxit |
a positive scalar which gives the maximum number of iterations for an inner loop of the parameter estimation in M step (default: |
Mstep.tol |
the tolerance to use within iterative procedure to estimate model parameters (default: Mstep.tol=1e-4). |
EMplot |
|
trace |
|
n.cores |
number of cores to be used with parallel computing. If |
A model-based clustering with covariates (mocca) for the functional subjects (curves and potentially covariates) is a gaussian mixture model with components.
Let
be the true function (curve) of the
subject, for a set of
independent subjects. Assume that for each subject we have a vector of observed values of the function
at times
obtained with some measurement errors. We are interested in clustering the subjects into
(homogenous) groups. Let
be the observed value of the
th curve at time point
. Then
where are assumed to be independent and normally distributed measurement errors
with mean
and variance
. Let
,
and
be the
-dimensional vectors for subject
, corresponding to the observed values, true values and measurement errors, respectively. Then, in matrix notation, the above could be written as
where
We further assume that the smooth function
can be expressed as
where
is a
-dimensional vector of known basis functions evaluated at time t, e.g. B-splines, and
a
-dimensional vector of unknown (random) coefficients. The
's are modelled as
where is a vector of expected spline coefficients for a cluster
and
denotes the unknown cluster membership, with
,
. The random vector
corresponds to subject-specific within-cluster variability.
Note that this variability is allowed to be different in different clusters, due to
. If desirable, given that subject
belongs to cluster
, a further parametrization of
may prove useful, for producing low-dimensional representations of the curves as suggested by James and Sugar (2003):
where and
are
- and
-dimensional vectors respectively, and
is a
matrix with
. Choosing
may be valuable, especially for sparse data. In order to ensure identifiability, some restrictions need to be put on the parameters. Imposing the restriction that
implies that
can be viewed as the overall mean curve. Depending on the choice of
and
, further restrictions may need to be imposed in order to have identifiability of the parameters (
and
are confounded if no restrictions are imposed).
In vector-notation we thus have
where is an
matrix with
on the
row,
We will also assume that the
's,
's and the
's are independent. Hence, given that subject
belongs to cluster
we have
Based on the observed data , the parameters
of the model can be estimated by maximizing the observed likelihood
where
and
is the normal density given above. Note that here
will denote all scalar, vectors and matrices of parameters to be estimated. An EM-type algorithm is used to maximize the likelihood above.
If additional covariates have been observed for each subject besides the curves, they can also be included in the model when clustering the subjects. Given that the subject belongs to cluster
the
covariates
are assumed to have mean value
and moreover
where we assume that
is the random deviation within cluster and
independent remaining unexplained variability.
Note that this model also incorporates the dependence between covariates and the random curves via the random basis coefficients. See Arnqvist and Sjöstedt de Luna (2019) for further details.
EM-algorithm is implemented to maximize the mixture likelihood.
The method is applied to annually varved lake sediment data from the lake Kassjön in Northern Sweden. See an example and also varve
for the data description.
The function returns an object of class "mocca"
with the following elements:
loglik |
the maximized log likelihood value. |
sig2 |
estimated residual variance for the functional data (for the model without covariates), or a vector of the estimated residual variances for the functional data and for the covariates (for the model with covariates). |
conv |
indicates why the EM algorithm terminated: 0: indicates successful completion. 1: indicates that the iteration limit |
iter |
number of iterations of the EM algorithm taken to get convergence. |
nobs |
number of subjects/curves. |
score.hist |
a matrix of the succesive values of the scores, residual variances and log likelihood, up until convergence. |
pars |
a list containing all the estimated parameters: |
vars |
a list containing results from the E step of the algorithm: the posterior probabilities for each subject |
data |
a list containing all the original data plus re-arranged functional data and covariates (if supplied). |
design |
a list of spline basis matrices with and without covariates:
|
initials |
a list of initial settings: |
Per Arnqvist, Natalya Pya Arnqvist, Sara Sjöstedt de Luna
Arnqvist, P., Bigler, C., Renberg, I., Sjöstedt de Luna, S. (2016). Functional clustering of varved lake sediment to reconstruct past seasonal climate. Environmental and Ecological Statistics, 23(4), 513–529.
Abramowicz, K., Arnqvist, P., Secchi, P., Sjöstedt de Luna, S., Vantini, S., Vitelli, V. (2017). Clustering misaligned dependent curves applied to varved lake sediment for climate reconstruction. Stochastic Environmental Research and Risk Assessment. Volume 31.1, 71–85.
Arnqvist, P., and Sjöstedt de Luna, S. (2019). Model based functional clustering of varved lake sediments. arXiv preprint arXiv:1904.10265.
James, G.M., Sugar, C.A. (2003). Clustering for sparsely sampled functional data. Journal of the American Statistical Association, 98.462, 397–408.
## example with lake sediment data from lake Kassjön... library(fdaMocca) data(varve) ## reduced data set ## run without covariates... m <- mocca(data=varve,K=3,n.cores=2) m ## some summary information... summary(m) criteria.mocca(m) AIC(m) BIC(m) ## various plots... plot(m) plot(m,select=2) plot(m,type=2,years=c(-750:750)) plot(m,type=2,probs=TRUE,pts=TRUE,years=c(-750:750)) plot(m,type=2,pts=TRUE,select=c(3,1),years=c(-750:750)) plot(m,type=3) plot(m,type=3,covariance=FALSE) ## model with two covariates... ## note, it takes some time to analyze the data... m1 <- mocca(data=varve, use.covariates=TRUE,index.cov=c(2,3), K=3,n.cores=2) m1 ## summary information... summary(m1) criteria.mocca(m1) ## various plots... plot(m1) plot(m1,type=2,pts=TRUE,years=c(-750:750)) plot(m1,type=3) plot(m1,type=3,covariance=FALSE) plot(m1,type=3,covariates=TRUE) ## simple simulated data... data(simdata) set.seed(2) m2 <- mocca(data=simdata,K=2,q=8,h=1,lambda=1e-10,n.cores=2,EMstep.tol=1e-3) summary(m2) criteria.mocca(m2) plot(m2) plot(m2,select=2) ## even simpler simulated data ##(reduced from 'simdata', EMstep.tol set high, q lower to allow automatic testing)... library(fdaMocca) data(simdata0) set.seed(2) m3 <- mocca(data=simdata0,K=2,q=5,h=1,lambda=1e-10,n.cores=2,EMstep.tol=.5, EMplot=FALSE,B=simdata0$B) summary(m3) #plot(m3) #plot(m3,select=2))
## example with lake sediment data from lake Kassjön... library(fdaMocca) data(varve) ## reduced data set ## run without covariates... m <- mocca(data=varve,K=3,n.cores=2) m ## some summary information... summary(m) criteria.mocca(m) AIC(m) BIC(m) ## various plots... plot(m) plot(m,select=2) plot(m,type=2,years=c(-750:750)) plot(m,type=2,probs=TRUE,pts=TRUE,years=c(-750:750)) plot(m,type=2,pts=TRUE,select=c(3,1),years=c(-750:750)) plot(m,type=3) plot(m,type=3,covariance=FALSE) ## model with two covariates... ## note, it takes some time to analyze the data... m1 <- mocca(data=varve, use.covariates=TRUE,index.cov=c(2,3), K=3,n.cores=2) m1 ## summary information... summary(m1) criteria.mocca(m1) ## various plots... plot(m1) plot(m1,type=2,pts=TRUE,years=c(-750:750)) plot(m1,type=3) plot(m1,type=3,covariance=FALSE) plot(m1,type=3,covariates=TRUE) ## simple simulated data... data(simdata) set.seed(2) m2 <- mocca(data=simdata,K=2,q=8,h=1,lambda=1e-10,n.cores=2,EMstep.tol=1e-3) summary(m2) criteria.mocca(m2) plot(m2) plot(m2,select=2) ## even simpler simulated data ##(reduced from 'simdata', EMstep.tol set high, q lower to allow automatic testing)... library(fdaMocca) data(simdata0) set.seed(2) m3 <- mocca(data=simdata0,K=2,q=5,h=1,lambda=1e-10,n.cores=2,EMstep.tol=.5, EMplot=FALSE,B=simdata0$B) summary(m3) #plot(m3) #plot(m3,select=2))
The function
takes a mocca
object produced by mocca()
and creates cluster means plots or covariance structure within each cluster.
## S3 method for class 'mocca' plot(x,type=1, select =NULL,transform=FALSE,covariance=TRUE, covariates =FALSE,lwd=2,ylab="",xlab="",main="",ylim=NULL, ncolors=NULL,probs=FALSE,pts=FALSE,size=50, years=NULL, years.names=NULL, ...)
## S3 method for class 'mocca' plot(x,type=1, select =NULL,transform=FALSE,covariance=TRUE, covariates =FALSE,lwd=2,ylab="",xlab="",main="",ylim=NULL, ncolors=NULL,probs=FALSE,pts=FALSE,size=50, years=NULL, years.names=NULL, ...)
x |
a |
type |
determines what type of plots to print. For |
select |
allows the plot for a single cluster mean to be selected for printing with |
transform |
logical, informs whether svd back-transformation of the spline model matrix should be applied (see Arnqvist and Sjöstedt de Luna, 2019). |
covariance |
logical, informs whether covariance ( |
covariates |
logical, informs whether covariates should be added when printing the covariance structure of the spline coefficients |
lwd |
defines the line width. |
ylab |
If supplied then this will be used as the |
xlab |
If supplied then this will be used as the |
main |
Used as title for plots if supplied. |
ylim |
If supplied then this pair of numbers are used as the |
ncolors |
defines the number of colors ( |
probs |
logical, used with |
pts |
logical, used with |
size |
the bin size used with |
years |
a vector of years used with annual data and needed for |
years.names |
a character vector that gives names of the years needed for |
... |
other graphics parameters to pass on to plotting commands. |
The function generates plots.
Per Arnqvist, Sara Sjöstedt de Luna, Natalya Pya Arnqvist
Arnqvist, P., and Sjöstedt de Luna, S. (2019). Model based functional clustering of varved lake sediments. arXiv preprint arXiv:1904.10265.
## see ?mocca help files
## see ?mocca help files
The default print method for a mocca
object.
## S3 method for class 'mocca' print(x, ...)
## S3 method for class 'mocca' print(x, ...)
x , ...
|
fitted model objects of class |
Prints out whether the model is fitted with or without covariates, the number of clusters, the estimated residual variance for the functional data and for the scalar covariates (if present), the number of covariates (if present), the maximized value of the log likelihood, and the number of subjects/curves.
No return value, the function prints out several fitted results.
Per Arnqvist, Natalya Pya Arnqvist, Sara Sjöstedt de Luna
simdata
is a simple test data set simulated from two clusters and consisting of 100 curves, with 50 curves belonging to one cluster and 50 to another. The test data set is a copy of the simdata
of James and Sugar (2003).
simdata
is a list of three vectors called as x
, curve
and time
.
simdata0
is a reduced dataset with only six curves in each cluster.
The data are from James and Sugar (2003).
James, G.M., Sugar, C.A. (2003). Clustering for sparsely sampled functional data. Journal of the American Statistical Association, 98.462, 397–408.
Takes a mocca
object produced by mocca()
and produces various useful
summaries from it.
## S3 method for class 'mocca' summary(object,...) ## S3 method for class 'summary.mocca' print(x,digits = max(3, getOption("digits") - 3),...)
## S3 method for class 'mocca' summary(object,...) ## S3 method for class 'summary.mocca' print(x,digits = max(3, getOption("digits") - 3),...)
object |
a fitted |
x |
a |
digits |
controls the number of digits printed in the output. |
... |
other arguments. |
summary.mocca
produces the following list of summary information for a mocca
object.
N |
number of observations |
K |
number of clusters |
r |
number of scalar covariates if model with covariates |
sig2 |
residual variance estimate for the functional data and for the scalar covariates (if the model is with covariates) |
p |
total number of the estimated parameters in the model |
tab_numOfCurves_cluster |
number of objects/curves in each cluster as a table. Here 'hard' clustering is applied, where each object/curve belongs to a cluster with the highest posterior probability. |
covariates_est |
mean value estimates for scalar covariates given cluster belongings (if the model is with covariates) |
t.probs |
estimated probabilities of belonging to each cluster |
crita |
a table with the maximized log likelihood, AIC, BIC and Shannon entropy values of the fitted model |
Per Arnqvist, Sara Sjöstedt de Luna, Natalya Pya Arnqvist
## see ?mocca help files
## see ?mocca help files
Annually varved lake sediment data from the lake Kassjön in Northern Sweden. The Kassjön data are used to illustrate the ideas of the model-based functional clustering with covariates. The varved sediment of lake Kassjön covers approximately 6400 years and is believed to be historical records of weather and climate. The varves (years) are clustered into similar groups based on their seasonal patterns (functional data) and additional covariates, all potentially carrying information on past climate/weather.
The Kassjön data has been analyzed in several papers. In Petterson et al. (2010, 1999, 1993) the sediment data was captured with image analysis and after preprocessing, the data was recorded as gray scale values with yearly deliminators, thus giving 6388 years (-4386 – 1901), or varves with 4–36 gray scale values per year. In Arnqvist et al. (2016) the shape/form of the yearly grey scale observations was modeled as curve functions and analyzed in a non-parametric functional data analysis approach. In Abramowicz et al. (2016) a Bagging Voronoi K-Medoid Alignment (BVKMA) method was proposed to group the varves into different "climates". The suggested procedure simultaneously clusters and aligns spatially dependent curves and is a nonparametric statistical method that does not rely on distributional or dependency structure assumptions.
varve
data is a list containing six objects named as x
, time
, timeindex
, curve
, grid
, covariates
. See mocca
for explanation of these objects.
varve_full
has observed subjects (years/varve), where for each varve we observed one function (seasonal pattern) and four covariates.
varve
is simply a reduced data set with only subjects.
The varve patterns have the following origin. During spring, in connection to snow melt and spring runoff, minerogenic material is transported from the catchment area into the lake through four small streams, which gives rise to a bright colored layer, giving high gray-scale values (Petterson et al., 2010). During summer, autochthonous organic matter, sinks to the bottom and creates a darker layer (lower gray-scale values). During the winter, when the lake is ice-covered, fine organic material is deposited, resulting in a thin blackish winter layer (lowest gray-scale values). There is substantial within- and between year variation, reflecting the balance between minerogenic and organic material. The properties of each varve reflect, to a large extent, weather conditions and internal biological processes in the lake the year that the varve was deposited. The minerogenic input reflects the intensity of the spring run-off, which is dependent on the amount of snow accumulated during the winter, and hence the variability in past winter climate.
The data consists of (subjects) years and the
observations per year
ranges from 4 to 37. A few years were missing, see Arnqvist et al. (2016) for details. For
each year
we observe the centered seasonal pattern in terms of grey scale values (
)'s
at
time points (pixels). We also register (the four covariates) the mean grey scale
within each year, the varve width
and the minerogenic accumulation rate (
)
corresponding to the total amount of minerogenic material per
in the varve (year)
,
see Petterson et al. (2010) for details. In order to make the seasonal patterns comparable
we first put them on the same time scale
, such that pixel position
at year
corresponds to position
To make
the patterns more comparable (with respect to climate) they were further aligned by
landmark registration, synchronizing the first spring peaks, that are directly related to
the spring flood that occurs approximately the same time each year.
As in previous analysis (Arnqvist et al., 2016) the first peak of each year is aligned
towards a common spring peak with an affine warping, that is, if we denote the common
spring peak as and the yearly spring peak as
and let
,
then we will have the warped time points according to
for
and
for
. The common spring peak and the
yearly spring peaks are given in Arnqvist et al. (2016).
Focusing on the functional forms of the seasonal patterns we finally centered them
within years and worked with (the centered values)
where
is the mean grey scale value of varve (year)
. In addition
to the seasonal patterns we also include four covariates: i)
, the mean grey scale; ii)
, the varve width (proportional to
); iii)
, the minerogenic accumulation rate
corresponding to the accumulated amount of minerogenic material per
in varve
; and iv)
, the landmark which is the distance from the start of the year to the first peak, interpreted as the start of the spring, for details see (Petterson et al., 2010, and and Arnqvist et al. 2016).
varve_full
is a full data set with years/curves spanning the time period 4486 B.C. until 1901 A.D..
varve
is a reduced data set with years/curves covering the time period 750 BC to 750 AD.
Arnqvist, P., Bigler, C., Renberg, I., Sjöstedt de Luna, S. (2016). Functional clustering of varved lake sediment to reconstruct past seasonal climate. Environmental and Ecological Statistics, 23(4), 513–529.
Abramowicz, K., Arnqvist, P., Secchi, P., Sjöstedt de Luna, S., Vantini, S., Vitelli, V. (2017). Clustering misaligned dependent curves applied to varved lake sediment for climate reconstruction. Stochastic Environmental Research and Risk Assessment. Volume 31.1, 71–85.
Arnqvist, P., and Sjöstedt de Luna, S. (2019). Model based functional clustering of varved lake sediments. arXiv preprint arXiv:1904.10265.
Petterson, G., Renberg, I., Sjöstedt de Luna, S., Arnqvist, P., and Anderson, N. J. (2010). Climatic influence on the inter-annual variability of late-Holocene minerogenic sediment supply in a boreal forest catchment. Earth surface processes and landforms. 35(4), 390–398.
Petterson, G., B. Odgaard, and I. Renberg (1999). Image analysis as a method to quantify sediment components. Journal of Paleolimnology 22. (4), 443–455.
Petterson, G., I. Renberg, P. Geladi, A. Lindberg, and F. Lindgren (1993). Spatial uniformity of sediment accumulation in varved lake sediments in Northern Sweden. Journal of Paleolimnology. 9(3), 195–208.