Title: | Curve Registration for Exponential Family Functional Data |
---|---|
Description: | A method for performing joint registration and functional principal component analysis for curves (functional data) that are generated from exponential family distributions. This mainly implements the algorithms described in 'Wrobel et al. (2019)' <doi:10.1111/biom.12963> and further adapts them to potentially incomplete curves where (some) curves are not observed from the beginning and/or until the end of the common domain. Curve registration can be used to better understand patterns in functional data by separating curves into phase and amplitude variability. This software handles both binary and continuous functional data, and is especially applicable in accelerometry and wearable technology. |
Authors: | Julia Wrobel [aut, cre] , Alexander Bauer [aut], Erin McDonnell [aut], Fabian Scheipl [ctb], Jeff Goldsmith [aut] |
Maintainer: | Julia Wrobel <[email protected]> |
License: | MIT + file LICENSE |
Version: | 2.1.0 |
Built: | 2024-11-28 06:58:19 UTC |
Source: | CRAN |
This function generates amplitudes for simulated accelerometer data.
amp_curve(grid, period = 2 * pi, spline_based = FALSE)
amp_curve(grid, period = 2 * pi, spline_based = FALSE)
grid |
Grid of x values over which to evaluate the function. |
period |
Controls the period of the mean curve |
spline_based |
If FALSE curve is constructed using sine and cosine functions, if TRUE, curve is constructed using B-spline basis. |
A numeric vector.
Function used in the FPCA step for registering binary functional data,
called by register_fpca
when family = "binomial"
.
This method uses a variational EM algorithm to estimate scores and principal components for
binary functional data.
The number of functional principal components (FPCs) can either be specified
directly (argument npc
) or chosen based on the explained share of
variance (npc_varExplained
). In the latter case, the explained share of
variance and accordingly the number of FPCs is estimated before the main
estimation step by once running the FPCA with npc = 20
(and
correspondingly Kt = 20
). Doing so, we approximate the overall
variance in the data Y
with the variance represented by the FPC basis
with 20 FPCs.
bfpca( Y, npc = NULL, npc_varExplained = NULL, Kt = 8, maxiter = 50, t_min = NULL, t_max = NULL, print.iter = FALSE, row_obj = NULL, seed = 1988, periodic = FALSE, error_thresh = 1e-04, verbose = 1, subsample = TRUE, ... )
bfpca( Y, npc = NULL, npc_varExplained = NULL, Kt = 8, maxiter = 50, t_min = NULL, t_max = NULL, print.iter = FALSE, row_obj = NULL, seed = 1988, periodic = FALSE, error_thresh = 1e-04, verbose = 1, subsample = TRUE, ... )
Y |
Dataframe. Should have variables id, value, index. |
npc |
The number of functional principal components (FPCs)
has to be specified either directly as |
npc_varExplained |
The number of functional principal components (FPCs)
has to be specified either directly as |
Kt |
Number of B-spline basis functions used to estimate mean functions
and functional principal components. Default is 8. If |
maxiter |
Maximum number of iterations to perform for EM algorithm. Default is 50. |
t_min |
Minimum value to be evaluated on the time domain. |
t_max |
Maximum value to be evaluated on the time domain. |
print.iter |
Prints current error and iteration |
row_obj |
If NULL, the function cleans the data and calculates row indices.
Keep this NULL if you are using standalone |
seed |
Set seed for reproducibility. Defaults to 1988. |
periodic |
If TRUE, uses periodic b-spline basis functions. Default is FALSE. |
error_thresh |
Error threshold to end iterations. Defaults to 0.0001. |
verbose |
Can be set to integers between 0 and 4 to control the level of detail of the printed diagnostic messages. Higher numbers lead to more detailed messages. Defaults to 1. |
subsample |
if the number of rows of the data is greater than 10 million rows, the 'id' values are subsampled to get the mean coefficients. |
... |
Additional arguments passed to or from other functions |
An object of class fpca
containing:
fpca_type |
Information that FPCA was performed with the 'variationEM' approach, in contrast to registr::gfpca_twoStep. |
t_vec |
Time vector over which the mean |
knots |
Cutpoints for B-spline basis used to rebuild |
efunctions |
|
evalues |
Estimated variance of the FPC scores. |
evalues_sum |
Approximation of the overall variance in |
npc |
number of FPCs. |
scores |
|
alpha |
Estimated population-level mean. |
mu |
Estimated population-level mean. Same value as |
subject_coefs |
B-spline basis coefficients used to construct subject-specific means.
For use in |
Yhat |
FPC approximation of subject-specific means, before applying the response function. |
Y |
The observed data. |
family |
|
error |
vector containing error for each iteration of the algorithm. |
Julia Wrobel [email protected], Jeff Goldsmith [email protected], Alexander Bauer [email protected]
Jaakkola, T. S. and Jordan, M. I. (1997). A variational approach to Bayesian logistic regression models and their extensions. Proceedings of the Sixth International Workshop on Artificial Intelligence and Statistics.
Tipping, M. E. (1999). Probabilistic Visualisation of High-dimensional binary data. Advances in neural information processing systems, 592–598.
Y = simulate_functional_data()$Y # estimate 2 FPCs bfpca_obj = bfpca(Y, npc = 2, print.iter = TRUE, maxiter = 25) plot(bfpca_obj) # estimate npc adaptively, to explain 90% of the overall variation bfpca_obj2 = bfpca(Y, npc_varExplained = 0.9, print.iter = TRUE, maxiter = 30) plot(bfpca_obj2)
Y = simulate_functional_data()$Y # estimate 2 FPCs bfpca_obj = bfpca(Y, npc = 2, print.iter = TRUE, maxiter = 25) plot(bfpca_obj) # estimate npc adaptively, to explain 90% of the overall variation bfpca_obj2 = bfpca(Y, npc_varExplained = 0.9, print.iter = TRUE, maxiter = 30) plot(bfpca_obj2)
Internal main preparation function for bfpca
bfpca_argPreparation( Y, Kt, time, t_min, t_max, periodic, seed, subsample, verbose )
bfpca_argPreparation( Y, Kt, time, t_min, t_max, periodic, seed, subsample, verbose )
Y , time , t_min , t_max
|
Internal objects created in |
Kt |
Number of B-spline basis functions used to estimate mean functions
and functional principal components. Default is 8. If |
periodic |
If TRUE, uses periodic b-spline basis functions. Default is FALSE. |
seed |
Set seed for reproducibility. Defaults to 1988. |
subsample |
if the number of rows of the data is greater than 10 million rows, the 'id' values are subsampled to get the mean coefficients. |
verbose |
Can be set to integers between 0 and 4 to control the level of detail of the printed diagnostic messages. Higher numbers lead to more detailed messages. Defaults to 1. |
List with elements knots
, Theta_phi
, xi
,
alpha_coefs
.
Main optimization function for bfpca
. If npc_varExplained
is specified, the function simply returns a list with elements npc
(chosen number of FPCs), evalues
(estimated variances of the first 'npc'
FPCs) and evalues_sum
(sum of the estimated variances of the first 20
FPCs, as approximation of the overall variance).
bfpca_optimization( npc, npc_varExplained = NULL, Kt, maxiter, print.iter, seed, periodic, error_thresh, verbose, Y, rows, I, knots, Theta_phi, xi, alpha_coefs )
bfpca_optimization( npc, npc_varExplained = NULL, Kt, maxiter, print.iter, seed, periodic, error_thresh, verbose, Y, rows, I, knots, Theta_phi, xi, alpha_coefs )
npc |
The number of functional principal components (FPCs)
has to be specified either directly as |
npc_varExplained |
The number of functional principal components (FPCs)
has to be specified either directly as |
Kt |
Number of B-spline basis functions used to estimate mean functions
and functional principal components. Default is 8. If |
maxiter |
Maximum number of iterations to perform for EM algorithm. Default is 50. |
print.iter |
Prints current error and iteration |
seed |
Set seed for reproducibility. Defaults to 1988. |
periodic |
If TRUE, uses periodic b-spline basis functions. Default is FALSE. |
error_thresh |
Error threshold to end iterations. Defaults to 0.0001. |
verbose |
Can be set to integers between 0 and 4 to control the level of detail of the printed diagnostic messages. Higher numbers lead to more detailed messages. Defaults to 1. |
Y , rows , I , knots , Theta_phi , xi , alpha_coefs
|
Internal objects created in
|
list with elements t_vec
, Theta_phi_mean
, alpha_coefs
,
efunctions
, evalues
, evalues_sum
, scores
,
subject_coef
, fittedVals
, error
. See documentation of
fpca_gauss
for details.
This function gets derivative of a spline basis. Adapted from bs()
function in splines
package.
bs_deriv( x, knots, degree = 3L, Boundary.knots = range(x), derivative = 1, intercept = TRUE )
bs_deriv( x, knots, degree = 3L, Boundary.knots = range(x), derivative = 1, intercept = TRUE )
x |
a numeric vector of values at which to evaluate the B-spline functions or derivatives. |
knots |
the internal breakpoints that define the spline. |
degree |
degree of the piecewise polynomial—default is 3 for cubic splines. |
Boundary.knots |
boundary points at which to anchor the B-spline basis. Set to [0,1] if you want this to be your domain. |
derivative |
a positive integer value that specifies which derivative to take. Defaults to 1 for 1st derivative. Value of 0 returns the original set of b-spline basis functions. |
intercept |
if TRUE, an intercept is included in the basis; default is TRUE |
A matrix containing:
basis |
A B-spline basis that can be used to approximate the derivative of a function. |
Julia Wrobel [email protected]
Reduce the resolution of a numeric vector by specifying the number of
significant_digits
to which the numbers should be rounded.
Internal function used to coarsen the index vector before estimating the
two-step GFPCA with gfpca_twoStep
.
coarsen_index(index, significant_digits)
coarsen_index(index, significant_digits)
index |
Numeric vector of index values. |
significant_digits |
Positive integer value. |
Numeric vector of rounded index values.
Alexander Bauer [email protected]
index_vector = c(0.7892, 0.2984, 0.328) registr:::coarsen_index(index_vector, 1) registr:::coarsen_index(index_vector, 3) index_vector2 = c(2803, -7639, 13) registr:::coarsen_index(index_vector2, 1) registr:::coarsen_index(index_vector2, 3)
index_vector = c(0.7892, 0.2984, 0.328) registr:::coarsen_index(index_vector, 1) registr:::coarsen_index(index_vector, 3) index_vector2 = c(2803, -7639, 13) registr:::coarsen_index(index_vector2, 1) registr:::coarsen_index(index_vector2, 3)
Constraints ensure monotonicity of spline coefficients for warping functions
for use with constrOptim()
function.
constraints(Kh, t_min = 0, t_max = 1, warping = "nonparametric")
constraints(Kh, t_min = 0, t_max = 1, warping = "nonparametric")
Kh |
Number of B-spline basis functions used to estimate warping functions h. |
t_min |
Minimum value to be evaluated on the time domain. |
t_max |
Maximum value to be evaluated on the time domain. |
warping |
If |
An list containing:
ui |
A constraint matrix. |
ci |
A constraint vector. |
Julia Wrobel [email protected], Erin McDonnell [email protected]
Internal function for the estimation of the covariance matrix of the latent
process using the approach of Hall et al. (2008). Used in the
two-step GFPCA approach implemented in gfpca_twoStep
.
This function is an adaptation of the implementation of Jan
Gertheiss and Ana-Maria Staicu for Gertheiss et al. (2017), with focus on
higher (RAM) efficiency for large data settings.
cov_hall( Y, index_evalGrid, Kt = 25, Kc = 10, family = "gaussian", diag_epsilon = 0.01, make_pd = TRUE )
cov_hall( Y, index_evalGrid, Kt = 25, Kc = 10, family = "gaussian", diag_epsilon = 0.01, make_pd = TRUE )
Y |
Dataframe. Should have values id, value, index. |
index_evalGrid |
Grid for the evaluation of the covariance structure. |
Kt |
Number of P-spline basis functions for the estimation of the marginal mean. Defaults to 25. |
Kc |
Number of marginal P-spline basis functions for smoothing the covariance surface. Defaults to 10. |
family |
One of |
diag_epsilon |
Small constant to which diagonal elements of the covariance matrix are set if they are smaller. Defaults to 0.01. |
make_pd |
Indicator if positive (semi-)definiteness of the returned
latent covariance should be ensured via |
The implementation deviates from the algorithm described in Hall (2008) in one crucial step – we compute the crossproducts of centered observations and smooth the surface of these crossproducts directly instead of computing and smoothing the surface of crossproducts of uncentered observations and subsequently subtracting the (crossproducts of the) mean function. The former seems to yield smoother eigenfunctions and fewer non-positive-definite covariance estimates.
If the data Y
or the crossproduct matrix contain more than
100,000
rows or elements, the estimation of the marginal mean or
the smoothing step of the covariance matrix are performed by
using the discretization-based estimation algorithm in bam
rather than the gam
estimation algorithm.
Covariance matrix with dimension time_evalGrid x time_evalGrid
.
Alexander Bauer [email protected] and Fabian Scheipl, based on work of Jan Gertheiss and Ana-Maria Staicu
Hall, P., Müller, H. G., & Yao, F. (2008). Modelling sparse generalized longitudinal observations with latent Gaussian processes. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(4), 703–723.
Gertheiss, J., Goldsmith, J., & Staicu, A. M. (2017). A note on modeling sparse exponential-family functional response curves. Computational statistics & data analysis, 105, 46–52.
data(growth_incomplete) index_grid = c(1.25, seq(from = 2, to = 18, by = 1)) cov_matrix = registr:::cov_hall(growth_incomplete, index_evalGrid = index_grid)
data(growth_incomplete) index_grid = c(1.25, seq(from = 2, to = 18, by = 1)) cov_matrix = registr:::cov_hall(growth_incomplete, index_evalGrid = index_grid)
Compute the crossproduct in a fast way for highly irregular grids
(index values are mostly unique).
Only used internally in cov_hall()
.
crossprods_irregular(Y)
crossprods_irregular(Y)
Y |
Dataframe with the centered observations. Should have values id, centered, index. |
Compute the crossproduct in a fast way for mostly regular grids
(index values are mostly *not* unique).
Only used internally in cov_hall()
.
crossprods_regular(Y)
crossprods_regular(Y)
Y |
Dataframe with the centered observations. Should have values id, centered, index. |
refund
objectFunction used for data cleaning.
data_clean(data)
data_clean(data)
data |
Dataframe. Should have values id, value, index. |
An list containing:
Y |
The original data sorted by id and index. |
Y_rows |
A dataframe containing the first and last row for each subject. |
Compute the derivative of the logit function for a given point x
.
## S3 method for class 'inv.logit' deriv(x)
## S3 method for class 'inv.logit' deriv(x)
x |
Value at which the derivative is computed |
Alexander Bauer [email protected]
This internal function is called in gfpca_twoStep
, fpca_gauss
and bfpca
to determine the number of functional principal components
based on their share of explained variance.
determine_npc(evalues, npc_criterion)
determine_npc(evalues, npc_criterion)
evalues |
Vector of estimated variances of the FPC scores. |
npc_criterion |
Either (i) a share between 0 and 1, or (ii) a vector with
two elements for the targeted explained share of variance and a cut-off scree
plot criterion, both between 0 and 1. For the latter, e.g.,
|
Integer for the number of fucntional principal components
Internal function. In the joint iterations between registration and GFPCA,
the optimization with constrOptim()
in the registration step sometimes
leads to slightly improper solutions, which cause the optimization to
throw an error in the following optimization step. This function corrects
the parameter vector if one of the following slight inconsistencies occurs
that can mess with the optimization of constrOptim()
:
- two neighboring values of the parameter vector are too similar
- the initial values of the parameter vector are smaller than t_min
,
the minimum of the underlying time domain
- the last values of the parameter vector are greater than t_max
,
the maximum of the underlying time domain
- one parameter value is slightly greater than its following value, i.e.
the parameter vector is not monotone.
ensure_proper_beta(beta, t_min, t_max)
ensure_proper_beta(beta, t_min, t_max)
beta |
Parameter vector. |
t_min , t_max
|
Minimum and maximum of the underlying time domain in the registration step. |
A slightly changed parameter vector that ensures a proper solution in the optimization of the registration step.
Alexander Bauer [email protected]
beta_improper = c(0.24, 1.000047, 1.000002) registr:::ensure_proper_beta(beta_improper, t_min = 0, t_max = 1)
beta_improper = c(0.24, 1.000047, 1.000002) registr:::ensure_proper_beta(beta_improper, t_min = 0, t_max = 1)
Calculations derived using maximum likelihood estimation.
expectedScores(Y, mu, psi, theta, theta_quad)
expectedScores(Y, mu, psi, theta, theta_quad)
Y |
vector of observations for the current subject. |
mu |
vector of spline coefficients for the population mean. |
psi |
matrix of spline coefficients for the principal component basis functions. |
theta |
spline basis functions for the current subject. |
theta_quad |
quadratic form of theta for the current subject. |
A list with expected score mean and variance for the current subject.
Function calculates value of variational parameter using maximum likelihood.
expectedXi(theta, mu, mi, psi, Ci)
expectedXi(theta, mu, mi, psi, Ci)
theta |
spline basis functions for the current subject. |
mu |
vector of spline coefficients for the population mean. |
mi |
vector of expected mean scores for the current subject. |
psi |
matrix of spline coefficients for the principal component basis functions. |
Ci |
expected covariance matrix of scores for the current subject. |
A vector of variational parameters for the current subject.
Function used in the FPCA step for registering functional data,
called by register_fpca
when family = "gaussian"
.
Parameters estimated based on probabilistic PCA framework originally
introduced by Tipping and Bishop in 1999.
The number of functional principal components (FPCs) can either be specified
directly (argument npc
) or chosen based on the explained share of
variance (npc_varExplained
). In the latter case, the explained share of
variance and accordingly the number of FPCs is estimated before the main
estimation step by once running the FPCA with npc = 20
(and
correspondingly Kt = 20
). Doing so, we approximate the overall
variance in the data Y
with the variance represented by the FPC basis
with 20 FPCs.
fpca_gauss( Y, npc = NULL, npc_varExplained = NULL, Kt = 8, maxiter = 20, t_min = NULL, t_max = NULL, print.iter = FALSE, row_obj = NULL, seed = 1988, periodic = FALSE, error_thresh = 1e-04, subsample = TRUE, verbose = 1, ... )
fpca_gauss( Y, npc = NULL, npc_varExplained = NULL, Kt = 8, maxiter = 20, t_min = NULL, t_max = NULL, print.iter = FALSE, row_obj = NULL, seed = 1988, periodic = FALSE, error_thresh = 1e-04, subsample = TRUE, verbose = 1, ... )
Y |
Dataframe. Should have variables id, value, index. |
npc , npc_varExplained
|
The number of functional principal components (FPCs)
has to be specified either directly as |
Kt |
Number of B-spline basis functions used to estimate mean functions
and functional principal components. Default is 8. If |
maxiter |
Maximum number of iterations to perform for EM algorithm. Default is 50. |
t_min |
Minimum value to be evaluated on the time domain. |
t_max |
Maximum value to be evaluated on the time domain. |
print.iter |
Prints current error and iteration |
row_obj |
If NULL, the function cleans the data and calculates row indices.
Keep this NULL if you are using standalone |
seed |
Set seed for reproducibility. Defaults to 1988. |
periodic |
If TRUE, uses periodic b-spline basis functions. Default is FALSE. |
error_thresh |
Error threshold to end iterations. Defaults to 0.0001. |
subsample |
if the number of rows of the data is greater than 10 million rows, the 'id' values are subsampled to get the mean coefficients. |
verbose |
Can be set to integers between 0 and 4 to control the level of detail of the printed diagnostic messages. Higher numbers lead to more detailed messages. Defaults to 1. |
... |
Additional arguments passed to or from other functions |
An object of class fpca
containing:
fpca_type |
Information that FPCA was performed with the 'variationEM' approach, in contrast to registr::gfpca_twoStep. |
t_vec |
Time vector over which the mean |
knots |
Cutpoints for B-spline basis used to rebuild |
efunctions |
|
evalues |
Estimated variance of the FPC scores. |
evalues_sum |
Approximation of the overall variance in |
npc |
number of FPCs. |
scores |
|
alpha |
Estimated population-level mean. |
mu |
Estimated population-level mean. Same value as |
subject_coefs |
B-spline basis coefficients used to construct subject-specific means.
For use in |
Yhat |
FPC approximation of subject-specific means. |
Y |
The observed data. |
family |
|
sigma2 |
Estimated error variance |
Julia Wrobel [email protected], Jeff Goldsmith [email protected], Alexander Bauer [email protected]
Tipping, M. E. and Bishop, C (1999). Probabilistic Principal Component Analysis. Journal of the Royal Statistical Society Series B,, 592–598.
data(growth_incomplete) # estimate 2 FPCs fpca_obj = fpca_gauss(Y = growth_incomplete, npc = 2) plot(fpca_obj) # estimate npc adaptively, to explain 90% of the overall variation fpca_obj2 = fpca_gauss(Y = growth_incomplete, npc_varExplained = 0.9) plot(fpca_obj, plot_FPCs = 1:2)
data(growth_incomplete) # estimate 2 FPCs fpca_obj = fpca_gauss(Y = growth_incomplete, npc = 2) plot(fpca_obj) # estimate npc adaptively, to explain 90% of the overall variation fpca_obj2 = fpca_gauss(Y = growth_incomplete, npc_varExplained = 0.9) plot(fpca_obj, plot_FPCs = 1:2)
Internal main preparation function for fpca_gauss
fpca_gauss_argPreparation( Y, Kt, time, t_min, t_max, periodic, seed, subsample, verbose )
fpca_gauss_argPreparation( Y, Kt, time, t_min, t_max, periodic, seed, subsample, verbose )
Y , time , t_min , t_max
|
Internal objects created in |
Kt |
Number of B-spline basis functions used to estimate mean functions
and functional principal components. Default is 8. If |
periodic |
If TRUE, uses periodic b-spline basis functions. Default is FALSE. |
seed |
Set seed for reproducibility. Defaults to 1988. |
subsample |
if the number of rows of the data is greater than 10 million rows, the 'id' values are subsampled to get the mean coefficients. |
verbose |
Can be set to integers between 0 and 4 to control the level of detail of the printed diagnostic messages. Higher numbers lead to more detailed messages. Defaults to 1. |
List with elements knots
, Theta_phi
, alpha_coefs
.
Main optimization function for fpca_gauss
. If npc_varExplained
is specified, the function simply returns a list with elements npc
(chosen number of FPCs), evalues
(estimated variances of the first 'npc'
FPCs) and evalues_sum
(sum of the estimated variances of the first 20
FPCs, as approximation of the overall variance).
fpca_gauss_optimization( npc, npc_varExplained = NULL, Kt, maxiter, print.iter, seed, periodic, error_thresh, verbose, Y, rows, I, knots, Theta_phi, alpha_coefs )
fpca_gauss_optimization( npc, npc_varExplained = NULL, Kt, maxiter, print.iter, seed, periodic, error_thresh, verbose, Y, rows, I, knots, Theta_phi, alpha_coefs )
npc |
The number of functional principal components (FPCs)
has to be specified either directly as |
npc_varExplained |
The number of functional principal components (FPCs)
has to be specified either directly as |
Kt |
Number of B-spline basis functions used to estimate mean functions
and functional principal components. Default is 8. If |
maxiter |
Maximum number of iterations to perform for EM algorithm. Default is 50. |
print.iter |
Prints current error and iteration |
seed |
Set seed for reproducibility. Defaults to 1988. |
periodic |
If TRUE, uses periodic b-spline basis functions. Default is FALSE. |
error_thresh |
Error threshold to end iterations. Defaults to 0.0001. |
verbose |
Can be set to integers between 0 and 4 to control the level of detail of the printed diagnostic messages. Higher numbers lead to more detailed messages. Defaults to 1. |
Y , rows , I , knots , Theta_phi , alpha_coefs
|
Internal objects created in
|
list with elements t_vec
, Theta_phi_mean
, alpha_coefs
,
efunctions
, evalues
, evalues_sum
, scores
,
subject_coef
, fittedVals
, sigma2
. See documentation of
fpca_gauss
for details.
Function for applying FPCA to different exponential family distributions.
Used in the FPCA step for registering functional data,
called by register_fpca
when fpca_type = "two-step"
.
The method implements the 'two-step approach' of Gertheiss et al. (2017)
and is based on the approach of Hall et al. (2008) to estimate functional
principal components.
The number of functional principal components (FPCs) can either be specified
directly (argument npc
) or chosen based on the explained share of
variance (npc_criterion
). Using the latter, we approximate the overall
variance in the data Y
with the variance represented by the smoothed
covariance surface estimated with cov_hall
.
Note that the Eigenvalue decomposition of this covariance surface
sometimes leads to a long tail of subordinate FPCs with small eigenvalues.
Such subordinate dimensions seem to often represent phase rather than
amplitude variation, and can be cut off by specifying the second element of
argument npc_criterion
.
This function is an adaptation of the implementation of Jan
Gertheiss for Gertheiss et al. (2017), with focus on higher (RAM) efficiency
for large data settings.
gfpca_twoStep( Y, family = "gaussian", npc = NULL, npc_criterion = NULL, Kt = 8, t_min = NULL, t_max = NULL, row_obj = NULL, index_significantDigits = 4L, estimation_accuracy = "high", start_params = NULL, periodic = FALSE, verbose = 1, ... )
gfpca_twoStep( Y, family = "gaussian", npc = NULL, npc_criterion = NULL, Kt = 8, t_min = NULL, t_max = NULL, row_obj = NULL, index_significantDigits = 4L, estimation_accuracy = "high", start_params = NULL, periodic = FALSE, verbose = 1, ... )
Y |
Dataframe. Should have values id, value, index. |
family |
One of |
npc , npc_criterion
|
The number of functional principal components (FPCs)
has to be specified either directly as |
Kt |
Number of B-spline basis functions used to estimate mean functions and functional principal components. Default is 8. |
t_min |
Minimum value to be evaluated on the time domain. |
t_max |
Maximum value to be evaluated on the time domain. |
row_obj |
If NULL, the function cleans the data and calculates row indices.
Keep this NULL if you are using standalone |
index_significantDigits |
Positive integer |
estimation_accuracy |
One of |
start_params |
Optional start values for gamm4. Not used if
|
periodic |
Only contained for full consistency with |
verbose |
Can be set to integers between 0 and 4 to control the level of detail of the printed diagnostic messages. Higher numbers lead to more detailed messages. Defaults to 1. |
... |
Additional arguments passed to |
For family = "poisson"
the values in Y
are rounded before
performing the GFPCA to ensure integer data. This is done to ensure reasonable
computation times. Computation times tend to explode when estimating the
underlying high-dimensional mixed model with continuous Poisson data based
on the gamm4
package.
If negative eigenvalues are present, the respective eigenfunctions are dropped and not considered further.
An object of class fpca
containing:
fpca_type |
Information that FPCA was performed with the 'two-step' approach, in contrast to registr::fpca_gauss or registr::bfpca. |
t_vec |
Time vector over which the mean |
knots |
Cutpoints for B-spline basis used to rebuild |
efunctions |
|
evalues |
Estimated variance of the FPC scores. |
evalues_sum |
Sum of all (nonnegative) eigenvalues of the smoothed
covariance surface estimated with |
npc |
number of FPCs. |
scores |
|
alpha |
Estimated population-level mean. |
mu |
Estimated population-level mean. Same value as |
subject_coefs |
Always |
Yhat |
FPC approximation of subject-specific means, before applying the response function. |
Y |
The observed data. |
family |
|
gamm4_theta |
Estimated parameters of the mixed model. |
Alexander Bauer [email protected], based on work of Jan Gertheiss
Gertheiss, J., Goldsmith, J., & Staicu, A. M. (2017). A note on modeling sparse exponential-family functional response curves. Computational statistics & data analysis, 105, 46–52.
Hall, P., Müller, H. G., & Yao, F. (2008). Modelling sparse generalized longitudinal observations with latent Gaussian processes. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(4), 703–723.
data(growth_incomplete) # estimate 2 FPCs fpca_obj = gfpca_twoStep(Y = growth_incomplete, npc = 2, family = "gaussian") plot(fpca_obj) # estimate npc adaptively, to explain 90% of the overall variation fpca_obj2 = gfpca_twoStep(Y = growth_incomplete, npc_criterion = 0.9, family = "gaussian") plot(fpca_obj2, plot_FPCs = 1:2)
data(growth_incomplete) # estimate 2 FPCs fpca_obj = gfpca_twoStep(Y = growth_incomplete, npc = 2, family = "gaussian") plot(fpca_obj) # estimate npc adaptively, to explain 90% of the overall variation fpca_obj2 = gfpca_twoStep(Y = growth_incomplete, npc_criterion = 0.9, family = "gaussian") plot(fpca_obj2, plot_FPCs = 1:2)
This function creates subject-specific time grid
grid_subj_create(coefs, D)
grid_subj_create(coefs, D)
coefs |
Spline basis coefficients for reconstructing the subject-specific grid. |
D |
Number of grid points per subject. |
A numeric vector.
This dataset from the Berkeley Growth Study comprises the height
development of 39 boys and 54 girls between ages 1 and 18.
It is based on the dataset fda::growth
and focuses not on the observed
heights, but on the first derivatives of the curves. Before taking the
first derivative, the curves were slightly smoothed.
To showcase the functionality of the registr
package regarding the
analysis of incomplete curves, the growth curves were artificially made
incomplete. For each child, leading incompleteness was simulated by drawing
a random initial age in the first quarter of the domain.
Also, trailing incompleteness was simulated by drawing
a random cut-off age in the second half of the domain.
data(growth_incomplete)
data(growth_incomplete)
A dataframe made up of
A unique subject identifier;
Observed age of the child's height;
First derivative of the height development in the given age.
Ramsay, J. O., and Silverman, B. W. (2006), Functional Data Analysis, 2nd ed., Springer, New York.
Tuddenham, R. D., and Snyder, M. M. (1954). Physical growth of California boys and girls from birth to age 18. University of California Publications in Child Development, 1, 183-364.
Dependent on the specific type of warping functions, this function creates
a vector of initial parameters. For "nonparametric"
warpings that
are based on a given spline basis matrix, the initial parameters are defined
s.t. the resulting (inverse) warping function equals a diagonal line.
For "piecewise_linear2"
warpings a fixed parameter vector is returned.
initial_params(warping = "nonparametric", K, t_vec)
initial_params(warping = "nonparametric", K, t_vec)
warping |
If |
K |
Spline basis matrix defined over the interval |
t_vec |
Vector of the observed and potentially irregular time grid. |
Simple function for use within other C++ functions.
lambdaF(x)
lambdaF(x)
x |
The value to which you apply the function |
A numeric value that has been transformed.
Loss function for registration step optimization
loss_h( Y, Theta_h, mean_coefs, knots, beta.inner, family, t_min, t_max, t_min_curve, t_max_curve, incompleteness = NULL, lambda_inc = NULL, periodic = FALSE, Kt = 8, warping = "nonparametric", priors = FALSE, prior_sd = NULL )
loss_h( Y, Theta_h, mean_coefs, knots, beta.inner, family, t_min, t_max, t_min_curve, t_max_curve, incompleteness = NULL, lambda_inc = NULL, periodic = FALSE, Kt = 8, warping = "nonparametric", priors = FALSE, prior_sd = NULL )
Y |
vector of observed points. |
Theta_h |
B-spline basis for inverse warping functions. |
mean_coefs |
spline coefficient vector for mean curve. |
knots |
knot locations for B-spline basis used to estimate mean and FPC basis function. |
beta.inner |
spline coefficient vector to be estimated for warping function h. |
family |
One of |
t_min , t_max
|
minimum and maximum value to be evaluated on the time domain. |
t_min_curve , t_max_curve
|
minimum and maximum value of the observed time domain of the (potentially incomplete) curve. |
incompleteness |
Optional specification of incompleteness structure.
One of |
lambda_inc |
Penalization parameter to control the amount of
overall dilation of the domain.
The higher this lambda, the more the registered domains are forced to have the
same length as the observed domains.
Only used if |
periodic |
If |
Kt |
Number of B-spline basis functions used to estimate mean functions. Default is 8. |
warping |
If |
priors |
For |
prior_sd |
For |
The scalar value taken by the loss function.
Julia Wrobel [email protected], Erin McDonnell [email protected], Alexander Bauer [email protected]
Gradient of loss function for registration step
loss_h_gradient( Y, Theta_h, mean_coefs, knots, beta.inner, family = "gaussian", incompleteness = NULL, lambda_inc = NULL, t_min, t_max, t_min_curve, t_max_curve, Kt = 8, periodic = FALSE, warping = "nonparametric" )
loss_h_gradient( Y, Theta_h, mean_coefs, knots, beta.inner, family = "gaussian", incompleteness = NULL, lambda_inc = NULL, t_min, t_max, t_min_curve, t_max_curve, Kt = 8, periodic = FALSE, warping = "nonparametric" )
Y |
vector of observed points. |
Theta_h |
B-spline basis for inverse warping functions. |
mean_coefs |
spline coefficient vector for mean curve. |
knots |
knot locations for B-spline basis used to estimate mean and FPC basis function. |
beta.inner |
spline coefficient vector to be estimated for warping function h. |
family |
One of |
incompleteness |
Optional specification of incompleteness structure.
One of |
lambda_inc |
Penalization parameter to control the amount of
overall dilation of the domain.
The higher this lambda, the more the registered domains are forced to have the
same length as the observed domains.
Only used if |
t_min |
minimum and maximum value to be evaluated on the time domain. |
t_max |
minimum and maximum value to be evaluated on the time domain. |
t_min_curve |
minimum and maximum value of the observed time domain of the (potentially incomplete) curve. |
t_max_curve |
minimum and maximum value of the observed time domain of the (potentially incomplete) curve. |
Kt |
Number of B-spline basis functions used to estimate mean functions. Default is 8. |
periodic |
If |
warping |
If |
A numeric vector of spline coefficients for the gradient of the loss function.
Julia Wrobel [email protected], Alexander Bauer [email protected]
This function generates mean for simulated accelerometer data.
mean_curve(grid, period = 2 * pi, spline_based = FALSE)
mean_curve(grid, period = 2 * pi, spline_based = FALSE)
grid |
Grid of x values over which to evaluate the function. |
period |
Controls the period of the mean curve |
spline_based |
If FALSE curve is constructed using sine and cosine functions, if TRUE, curve is constructed using B-spline basis. |
A numeric vector.
This function generates mean for simulated functional data.
mean_sim(grid)
mean_sim(grid)
grid |
Grid of x values over which to evaluate the function. |
Subset of 24 hours of activity data for 50 subjects from 2003-2004 National Health and Nutrition Examination Survey (NHANES). Each subject is observed over 24 hours on a Sunday and wore the activity collection device for a minimum of 10 hours. Activity is measured each minute over 24 hours.
data(nhanes)
data(nhanes)
A dataframe made up of
A unique subject identifier;
Age of survey participant;
Gender of survey participant;
Observed time of activity measurement. Integers from 1 to 1440, indicating minutes from midnight to midnight;
Binary value of zero or one indicating inactivity or activity;
Raw activity count.
This function uses a 2-knot piecewise linear model to calculate inverse warping
functions for registration. The parameters knot1_x
and knot1_y
control the x and y locations of the first knot, and the parameters
knot1_x
and knot1_y
control the x and y locations of the second
knot. The designation (inverse) is intended to communicate that these
functions take data from the unregistered space to the registered space,
consistent with functional data literature on registration.
piecewise_linear2_hinv(grid, knot_locations = c(0.25, 0.3, 0.75, 0.9))
piecewise_linear2_hinv(grid, knot_locations = c(0.25, 0.3, 0.75, 0.9))
grid |
grid of values over which to evaluate the function. |
knot_locations |
controls the x and y locations of the two knots. |
Erin McDonnell [email protected]
S3 plot method for class fpca
.
Plot FPCA results by visualizing the variation of the individual FPCs around
the global mean. based on an object created with function
fpca_gauss
, bfpca
or gfpca_twoStep
.
The shares of explained variance are included in the plot titles if
x
contains an element evalues_sum
.
## S3 method for class 'fpca' plot( x, plot_FPCs = 1:x$npc, sd_factor = 2, response_function = NULL, add_symbols = TRUE, subtitle = TRUE, xlim = NULL, ylim = NULL, xlab = "t [registered]", ylab = "y", ... )
## S3 method for class 'fpca' plot( x, plot_FPCs = 1:x$npc, sd_factor = 2, response_function = NULL, add_symbols = TRUE, subtitle = TRUE, xlim = NULL, ylim = NULL, xlab = "t [registered]", ylab = "y", ... )
x |
Object of class |
plot_FPCs |
Optional index vector of the FPCs to be plotted.
Defaults to all FPCs contained in |
sd_factor |
Numeric factor with which the standard deviations of each FPC's scores are multiplied to display its variation in the plots. Defaults to 2. |
response_function |
Optional response function to be applied before
plotting the curves. Defaults to |
add_symbols |
Indicator if '+' and '-' symbols should be added to the plot to highlight the direction of the displayed FPCs. Defaults to TRUE. |
subtitle |
If TRUE (default) the parameter |
xlim , ylim
|
Optional numeric vectors with limits for the x and y axis. |
xlab , ylab
|
Optional titles for the x and y axis. |
... |
Additional arguments passed to |
@return If multiple FPCs are plotted, returns a grid of ggplot
plots, created with cowplot::plot_grid
. If only one FPC is plotted,
returns a single ggplot
plot.
Alexander Bauer [email protected]
data(growth_incomplete) fpca_obj = fpca_gauss(Y = growth_incomplete, npc = 2) if (requireNamespace("ggplot2", quietly = TRUE) && requireNamespace("cowplot", quietly = TRUE)) { library(ggplot2) plot(fpca_obj) }
data(growth_incomplete) fpca_obj = fpca_gauss(Y = growth_incomplete, npc = 2) if (requireNamespace("ggplot2", quietly = TRUE) && requireNamespace("cowplot", quietly = TRUE)) { library(ggplot2) plot(fpca_obj) }
This function generates the first principal component for simulated functional data.
psi1_sim(grid)
psi1_sim(grid)
grid |
Grid of x values over which to evaluate the function. |
This function generates the second principal component for simulated functional data.
psi2_sim(grid)
psi2_sim(grid)
grid |
Grid of x values over which to evaluate the function. |
Function combines constrained optimization and GFPCA to estimate warping functions for
exponential family curves. See argument family
for which families are
supported. Warping functions are calculated by the function registr
.
The GFPCA step can be performed either using the variational EM-based GFPCA
approaches of Wrobel et al. (2019) (fpca_type = "variationalEM"
, default)
or the mixed model-based two-step approach of Gertheiss et al. (2017)
(fpca_type = "two-step"
).
Warping functions by default are forced to start and end on the diagonal to be
domain-preserving. This behavior can be changed by setting
incompleteness
to some other value than NULL and a reasonable lambda_inc
value.
For further details see the accompanying vignette.
The number of functional principal components (FPCs) can either be specified
directly (argument npc
) or chosen based on the explained share of
variance in each iteration (argument npc_criterion
).
By specifying cores > 1
the registration call can be parallelized.
register_fpca( Y, Kt = 8, Kh = 4, family = "gaussian", incompleteness = NULL, lambda_inc = NULL, Y_template = NULL, max_iterations = 10, npc = NULL, npc_criterion = NULL, fpca_type = "variationalEM", fpca_maxiter = 50, fpca_seed = 1988, fpca_error_thresh = 1e-04, fpca_index_significantDigits = 4L, cores = 1L, verbose = 1, ... )
register_fpca( Y, Kt = 8, Kh = 4, family = "gaussian", incompleteness = NULL, lambda_inc = NULL, Y_template = NULL, max_iterations = 10, npc = NULL, npc_criterion = NULL, fpca_type = "variationalEM", fpca_maxiter = 50, fpca_seed = 1988, fpca_error_thresh = 1e-04, fpca_index_significantDigits = 4L, cores = 1L, verbose = 1, ... )
Y |
Dataframe. Should have values id, value, index. |
Kt |
Number of B-spline basis functions used to estimate mean functions
and functional principal components. Default is 8. If
|
Kh |
Number of B-spline basis functions used to estimate warping functions h. Default is 4. |
family |
One of |
incompleteness |
Optional specification of incompleteness structure.
One of |
lambda_inc |
Penalization parameter to control the amount of
overall dilation of the domain.
The higher this lambda, the more the registered domains are forced to have the
same length as the observed domains.
Only used if |
Y_template |
Optional dataframe with the same structure as |
max_iterations |
Number of iterations for overall algorithm. Defaults to 10. |
npc , npc_criterion
|
The number of functional principal components (FPCs)
has to be specified either directly as |
fpca_type |
One of |
fpca_maxiter |
Only used if |
fpca_seed |
Only used if |
fpca_error_thresh |
Only used if |
fpca_index_significantDigits |
Only used if |
cores |
Number of cores to be used. If |
verbose |
Can be set to integers between 0 and 4 to control the level of detail of the printed diagnostic messages. Higher numbers lead to more detailed messages. Defaults to 1. |
... |
Additional arguments passed to registr and to the gfpca functions
(if |
Requires input data Y
to be a dataframe in long format with variables
id
, index
, and value
to indicate subject IDs,
observation times on the domain, and observations, respectively.
One joint iteration consists of a GFPCA step and a registration step.
As preprocessing, one initial registration step is performed.
The template function for this registration step is defined by argument
Y_template
.
After convergence or max_iterations
is reached, one final GFPCA step
is performed.
An object of class registration
containing:
Y |
The observed data plus variables |
fpca_obj |
List of items from FPCA step. |
family |
Used exponential family. |
index_warped |
List of the (warped) index values for each iteration.
Has |
hinv_innerKnots |
List of inner knots for setting up the spline bases
for the inverse warping functions. Only contains |
hinv_beta |
Matrix of B-spline basis coefficients used to construct the
subject-specific inverse warping functions. From the last performed
registration step. For details see |
convergence |
List with information on the convergence of the joint
approach. Containing the following elements: |
Julia Wrobel [email protected] Jeff Goldsmith [email protected], Alexander Bauer [email protected]
### complete binomial curves Y = simulate_unregistered_curves(I = 20, D = 200) # estimation based on Wrobel et al. (2019) reg = register_fpca(Y, npc = 2, family = "binomial", fpca_type = "variationalEM", max_iterations = 5) if (requireNamespace("ggplot2", quietly = TRUE)) { library(ggplot2) ggplot(reg$Y, aes(x = tstar, y = t_hat, group = id)) + geom_line(alpha = 0.2) + ggtitle("Estimated warping functions") plot(reg$fpca_obj, response_function = function(x) { 1 / (1 + exp(-x)) }) } # estimation based on Gertheiss et al. (2017) reg2 = register_fpca(Y, npc = 2, family = "binomial", fpca_type = "two-step", max_iterations = 5, fpca_index_significantDigits = 4) # example using accelerometer data from nhanes 2003-2004 study data(nhanes) nhanes_short = nhanes[nhanes$id %in% unique(nhanes$id)[1:5],] reg_nhanes = register_fpca(nhanes_short, npc = 2, family = "binomial", max_iterations = 5) ### incomplete Gaussian curves data(growth_incomplete) # Force the warping functions to start and end on the diagonal reg2a = register_fpca(growth_incomplete, npc = 2, family = "gaussian", incompleteness = NULL, max_iterations = 5) if (requireNamespace("ggplot2", quietly = TRUE)) { ggplot(reg2a$Y, aes(x = tstar, y = t_hat, group = id)) + geom_line(alpha = 0.2) + ggtitle("Estimated warping functions") ggplot(reg2a$Y, aes(x = t_hat, y = value, group = id)) + geom_line(alpha = 0.2) + ggtitle("Registered curves") } # Allow the warping functions to not start / end on the diagonal. # The higher lambda_inc, the more the starting points and endpoints are forced # towards the diagonal. reg2b = register_fpca(growth_incomplete, npc = 2, family = "gaussian", incompleteness = "full", lambda_inc = 0.1, max_iterations = 5) if (requireNamespace("ggplot2", quietly = TRUE)) { ggplot(reg2b$Y, aes(x = tstar, y = t_hat, group = id)) + geom_line(alpha = 0.2) + ggtitle("Estimated warping functions") ggplot(reg2b$Y, aes(x = t_hat, y = value, group = id)) + geom_line(alpha = 0.2) + ggtitle("Registered curves") } ### complete Gamma curves Y = simulate_unregistered_curves(I = 20, D = 100) Y$value = exp(Y$latent_mean) registr_gamma = register_fpca(Y, npc = 2, family = "gamma", fpca_type = "two-step", gradient = FALSE, max_iterations = 3)
### complete binomial curves Y = simulate_unregistered_curves(I = 20, D = 200) # estimation based on Wrobel et al. (2019) reg = register_fpca(Y, npc = 2, family = "binomial", fpca_type = "variationalEM", max_iterations = 5) if (requireNamespace("ggplot2", quietly = TRUE)) { library(ggplot2) ggplot(reg$Y, aes(x = tstar, y = t_hat, group = id)) + geom_line(alpha = 0.2) + ggtitle("Estimated warping functions") plot(reg$fpca_obj, response_function = function(x) { 1 / (1 + exp(-x)) }) } # estimation based on Gertheiss et al. (2017) reg2 = register_fpca(Y, npc = 2, family = "binomial", fpca_type = "two-step", max_iterations = 5, fpca_index_significantDigits = 4) # example using accelerometer data from nhanes 2003-2004 study data(nhanes) nhanes_short = nhanes[nhanes$id %in% unique(nhanes$id)[1:5],] reg_nhanes = register_fpca(nhanes_short, npc = 2, family = "binomial", max_iterations = 5) ### incomplete Gaussian curves data(growth_incomplete) # Force the warping functions to start and end on the diagonal reg2a = register_fpca(growth_incomplete, npc = 2, family = "gaussian", incompleteness = NULL, max_iterations = 5) if (requireNamespace("ggplot2", quietly = TRUE)) { ggplot(reg2a$Y, aes(x = tstar, y = t_hat, group = id)) + geom_line(alpha = 0.2) + ggtitle("Estimated warping functions") ggplot(reg2a$Y, aes(x = t_hat, y = value, group = id)) + geom_line(alpha = 0.2) + ggtitle("Registered curves") } # Allow the warping functions to not start / end on the diagonal. # The higher lambda_inc, the more the starting points and endpoints are forced # towards the diagonal. reg2b = register_fpca(growth_incomplete, npc = 2, family = "gaussian", incompleteness = "full", lambda_inc = 0.1, max_iterations = 5) if (requireNamespace("ggplot2", quietly = TRUE)) { ggplot(reg2b$Y, aes(x = tstar, y = t_hat, group = id)) + geom_line(alpha = 0.2) + ggtitle("Estimated warping functions") ggplot(reg2b$Y, aes(x = t_hat, y = value, group = id)) + geom_line(alpha = 0.2) + ggtitle("Registered curves") } ### complete Gamma curves Y = simulate_unregistered_curves(I = 20, D = 100) Y$value = exp(Y$latent_mean) registr_gamma = register_fpca(Y, npc = 2, family = "gamma", fpca_type = "two-step", gradient = FALSE, max_iterations = 3)
Software for registering functional data from the exponential family of distributions.
Function used in the registration step of an FPCA-based approach for
registering exponential-family, potentially incomplete functional data,
called by register_fpca
.
This method uses constrained optimization to estimate spline
coefficients for warping functions, where the objective function for optimization comes from
maximizing the EF likelihood subject to monotonicity constraints on the warping functions.
You have to either specify obj
, which is a fpca
object from an earlier step, or Y
, a dataframe in long format with variables
id
, index
, and value
to indicate subject IDs, times, and observations,
respectively.
Warping functions by default are forced to start and end on the diagonal to be
domain-preserving. This behavior can be changed by setting
incompleteness
to some other value than NULL and a reasonable lambda_inc
value.
For further details see the accompanying vignette.
By specifying cores > 1
the registration call can be parallelized.
registr( obj = NULL, Y = NULL, Kt = 8, Kh = 4, family = "gaussian", gradient = TRUE, incompleteness = NULL, lambda_inc = NULL, Y_template = NULL, beta = NULL, t_min = NULL, t_max = NULL, row_obj = NULL, periodic = FALSE, warping = "nonparametric", gamma_scales = NULL, cores = 1L, subsample = TRUE, verbose = 1, ... )
registr( obj = NULL, Y = NULL, Kt = 8, Kh = 4, family = "gaussian", gradient = TRUE, incompleteness = NULL, lambda_inc = NULL, Y_template = NULL, beta = NULL, t_min = NULL, t_max = NULL, row_obj = NULL, periodic = FALSE, warping = "nonparametric", gamma_scales = NULL, cores = 1L, subsample = TRUE, verbose = 1, ... )
obj |
Current estimate of FPC object. Can be NULL only if Y argument is selected. |
Y |
Dataframe. Should have values id, value, index. |
Kt |
Number of B-spline basis functions used to estimate mean functions. Default is 8. |
Kh |
Number of B-spline basis functions used to estimate warping functions h. Default is 4. |
family |
One of |
gradient |
If |
incompleteness |
Optional specification of incompleteness structure.
One of |
lambda_inc |
Penalization parameter to control the amount of
overall dilation of the domain.
The higher this lambda, the more the registered domains are forced to have the
same length as the observed domains.
Only used if |
Y_template |
Optional dataframe with the same structure as |
beta |
Current estimates for beta for each subject. Default is NULL. |
t_min |
Minimum value to be evaluated on the time domain. if 'NULL', taken to be minimum observed value. |
t_max |
Maximum value to be evaluated on the time domain. if 'NULL', taken to be maximum observed value. |
row_obj |
If NULL, the function cleans the data and calculates row indices.
Keep this NULL if you are using standalone |
periodic |
If |
warping |
If |
gamma_scales |
Only used for |
cores |
Number of cores to be used. If |
subsample |
if the number of rows of the data is greater than 10 million rows, the 'id' values are subsampled to get the mean coefficients. |
verbose |
Can be set to integers between 0 and 4 to control the level of detail of the printed diagnostic messages. Higher numbers lead to more detailed messages. Defaults to 1. |
... |
additional arguments passed to or from other functions |
The template function for the registration is defined by argument obj
or Y_template
, depending on if obj
is NULL or not, respectively.
An list containing:
Y |
The observed data. The variables |
loss |
Value of the loss function after registraton. |
hinv_innerKnots |
List of inner knots for setting up the spline bases
for the inverse warping functions. Only contains |
hinv_beta |
Matrix of B-spline basis coefficients used to construct
subject-specific inverse warping functions. See examples on how to
reconstruct a warping function based on |
Julia Wrobel
Julia Wrobel [email protected], Erin McDonnell [email protected], Alexander Bauer [email protected]
### complete binomial curves Y = simulate_unregistered_curves() register_step = registr(obj = NULL, Y = Y, Kt = 6, Kh = 4, family = "binomial", gradient = TRUE) ### incomplete Gaussian curves data(growth_incomplete) # Force the warping functions to start and end on the diagonal to preserve the domain register_step2a = registr(obj = NULL, Y = growth_incomplete, Kt = 6, Kh = 4, family = "gaussian", gradient = TRUE, incompleteness = NULL) if (requireNamespace("ggplot2", quietly = TRUE)) { library(ggplot2) ggplot(register_step2a$Y, aes(x = tstar, y = index, group = id)) + geom_line(alpha = 0.2) + ggtitle("Estimated warping functions") ggplot(register_step2a$Y, aes(x = index, y = value, group = id)) + geom_line(alpha = 0.2) + ggtitle("Registered curves") } # Example for how to recreate an estimated inverse warping function given # the output of registr(). Focus on id "boy01". id = "boy01" index_obsRange_i = range(growth_incomplete$index[growth_incomplete$id == id]) index = seq(min(index_obsRange_i), max(index_obsRange_i), length.out = 100) # (note that 'index' must contain both the observed min and max in index_obsRange_i) Theta_h_i = splines::bs(index, knots = register_step2a$hinv_innerKnots[[id]], intercept = TRUE) index_reg = as.vector(Theta_h_i %*% register_step2a$hinv_beta[,id]) warp_dat_i = data.frame(index_observed = index, index_registered = index_reg) if (requireNamespace("ggplot2", quietly = TRUE)) { ggplot(warp_dat_i, aes(x = index_observed, y = index_registered)) + geom_line() + ggtitle("Extracted warping function for id 'boy01'") } # Allow the warping functions to not start / end on the diagonal. # The higher lambda_inc, the more the starting points and endpoints are # forced towards the diagonal. register_step2b = registr(obj = NULL, Y = growth_incomplete, Kt = 6, Kh = 4, family = "gaussian", gradient = TRUE, incompleteness = "full", lambda_inc = 1) if (requireNamespace("ggplot2", quietly = TRUE)) { ggplot(register_step2b$Y, aes(x = tstar, y = index, group = id)) + geom_line(alpha = 0.2) + ggtitle("Estimated warping functions") ggplot(register_step2b$Y, aes(x = index, y = value, group = id)) + geom_line(alpha = 0.2) + ggtitle("Registered curves") } # Define the template function only over a subset of the curves # (even though not very reasonable in this example) template_ids = c("girl12","girl13","girl14") Y_template = growth_incomplete[growth_incomplete$id %in% template_ids,] register_step2c = registr(obj = NULL, Y = growth_incomplete, Kt = 6, Kh = 4, family = "gaussian", gradient = TRUE, Y_template = Y_template, incompleteness = "full", lambda_inc = 1) if (requireNamespace("ggplot2", quietly = TRUE)) { ggplot(register_step2c$Y, aes(x = index, y = value, group = id)) + geom_line(alpha = 0.2) + ggtitle("Registered curves") }
### complete binomial curves Y = simulate_unregistered_curves() register_step = registr(obj = NULL, Y = Y, Kt = 6, Kh = 4, family = "binomial", gradient = TRUE) ### incomplete Gaussian curves data(growth_incomplete) # Force the warping functions to start and end on the diagonal to preserve the domain register_step2a = registr(obj = NULL, Y = growth_incomplete, Kt = 6, Kh = 4, family = "gaussian", gradient = TRUE, incompleteness = NULL) if (requireNamespace("ggplot2", quietly = TRUE)) { library(ggplot2) ggplot(register_step2a$Y, aes(x = tstar, y = index, group = id)) + geom_line(alpha = 0.2) + ggtitle("Estimated warping functions") ggplot(register_step2a$Y, aes(x = index, y = value, group = id)) + geom_line(alpha = 0.2) + ggtitle("Registered curves") } # Example for how to recreate an estimated inverse warping function given # the output of registr(). Focus on id "boy01". id = "boy01" index_obsRange_i = range(growth_incomplete$index[growth_incomplete$id == id]) index = seq(min(index_obsRange_i), max(index_obsRange_i), length.out = 100) # (note that 'index' must contain both the observed min and max in index_obsRange_i) Theta_h_i = splines::bs(index, knots = register_step2a$hinv_innerKnots[[id]], intercept = TRUE) index_reg = as.vector(Theta_h_i %*% register_step2a$hinv_beta[,id]) warp_dat_i = data.frame(index_observed = index, index_registered = index_reg) if (requireNamespace("ggplot2", quietly = TRUE)) { ggplot(warp_dat_i, aes(x = index_observed, y = index_registered)) + geom_line() + ggtitle("Extracted warping function for id 'boy01'") } # Allow the warping functions to not start / end on the diagonal. # The higher lambda_inc, the more the starting points and endpoints are # forced towards the diagonal. register_step2b = registr(obj = NULL, Y = growth_incomplete, Kt = 6, Kh = 4, family = "gaussian", gradient = TRUE, incompleteness = "full", lambda_inc = 1) if (requireNamespace("ggplot2", quietly = TRUE)) { ggplot(register_step2b$Y, aes(x = tstar, y = index, group = id)) + geom_line(alpha = 0.2) + ggtitle("Estimated warping functions") ggplot(register_step2b$Y, aes(x = index, y = value, group = id)) + geom_line(alpha = 0.2) + ggtitle("Registered curves") } # Define the template function only over a subset of the curves # (even though not very reasonable in this example) template_ids = c("girl12","girl13","girl14") Y_template = growth_incomplete[growth_incomplete$id %in% template_ids,] register_step2c = registr(obj = NULL, Y = growth_incomplete, Kt = 6, Kh = 4, family = "gaussian", gradient = TRUE, Y_template = Y_template, incompleteness = "full", lambda_inc = 1) if (requireNamespace("ggplot2", quietly = TRUE)) { ggplot(register_step2c$Y, aes(x = index, y = value, group = id)) + geom_line(alpha = 0.2) + ggtitle("Registered curves") }
This internal function is only to be used from within registr
.
It performs the main optimization step with constrOptim
for the
registration of one curve.
registr_oneCurve( obj = NULL, Y = NULL, Kt = 8, Kh = 4, family = "gaussian", gradient = TRUE, incompleteness = NULL, lambda_inc = NULL, beta = NULL, t_min = NULL, t_max = NULL, periodic = FALSE, warping = "nonparametric", gamma_scales = NULL, global_knots = NULL, mean_coefs = NULL, ..., verbose = 1, just_return_list = FALSE )
registr_oneCurve( obj = NULL, Y = NULL, Kt = 8, Kh = 4, family = "gaussian", gradient = TRUE, incompleteness = NULL, lambda_inc = NULL, beta = NULL, t_min = NULL, t_max = NULL, periodic = FALSE, warping = "nonparametric", gamma_scales = NULL, global_knots = NULL, mean_coefs = NULL, ..., verbose = 1, just_return_list = FALSE )
obj |
Current estimate of FPC object. Can be NULL only if Y argument is selected. |
Y |
Dataframe. Should have values id, value, index. |
Kt |
Number of B-spline basis functions used to estimate mean functions. Default is 8. |
Kh |
Number of B-spline basis functions used to estimate warping functions h. Default is 4. |
family |
One of |
gradient |
If |
incompleteness |
Optional specification of incompleteness structure.
One of |
lambda_inc |
Penalization parameter to control the amount of
overall dilation of the domain.
The higher this lambda, the more the registered domains are forced to have the
same length as the observed domains.
Only used if |
beta |
Current estimates for beta for each subject. Default is NULL. |
t_min |
Minimum value to be evaluated on the time domain. if 'NULL', taken to be minimum observed value. |
t_max |
Maximum value to be evaluated on the time domain. if 'NULL', taken to be maximum observed value. |
periodic |
If |
warping |
If |
gamma_scales |
Only used for |
global_knots |
knots for the basis/splines, passed to [pbs::pbs()] or [stats::bs()] |
mean_coefs |
Mean coefficients for the mean of all curves or GFPCA based. May extract from 'obj' object |
... |
additional arguments passed to or from other functions |
verbose |
Can be set to integers between 0 and 4 to control the level of detail of the printed diagnostic messages. Higher numbers lead to more detailed messages. Defaults to 1. |
just_return_list |
Do not use. For developers only |
An list containing:
hinv_innerKnots |
Inner knots for setting up the spline basis for the inverse warping function. |
hinv_beta |
Estimated B-spline basis coefficients used to construct subject-specific inverse warping functions. |
t_hat |
Vector of registered time domain. |
loss |
Loss of the optimal solution. |
Julia Wrobel [email protected], Erin McDonnell [email protected], Alexander Bauer [email protected]
This function simulates functional data. The data it outputs is generated from a mean function and two orthogonal principal component basis functions. The mean and principal components are based on sine and cosine functions. Subject-specific scores for each PC are drawn from normal distributions with standard deviation lambda1 and lambda2.
simulate_functional_data( lambda1 = 2, lambda2 = 1, I = 50, D = 100, seed = 1988, vary_D = FALSE )
simulate_functional_data( lambda1 = 2, lambda2 = 1, I = 50, D = 100, seed = 1988, vary_D = FALSE )
lambda1 |
Standard deviation for PC1 scores. |
lambda2 |
Standard deviation for PC2 scores. |
I |
Number of subjects. Defaults is 50. |
D |
Number of grid points per subject. Default is 100. |
seed |
Seed for reproducibility. Default is 1988. |
vary_D |
Indicates if grid length vary by subject. If FALSE all subjects have grid length D. |
A list containing:
Y |
Simulated dataframe with variables id, value, index, and latent_mean. |
psi1 |
True values for first principal component. |
psi2 |
True values for second principal component. |
alpha |
True values for population-level mean. |
A list containing:
Y |
A dataframe of simulated data. |
psi1 |
The first simulated eigenfunction. |
psi2 |
The second simulated eigenfunction. |
alpha |
The population mean. |
Julia Wrobel [email protected]
This function simulates unregistered curves, providing the time values for both the unregistered curves (t_star) and the registered curves (t). Curves all have one peak, the location of which is shifted on the unregistered domain, meant to mimic accelerometer data.
simulate_unregistered_curves( I = 50, D = 100, lambda = 15, seed = 1988, period = 2 * pi, spline_based = FALSE, phase_variation = TRUE )
simulate_unregistered_curves( I = 50, D = 100, lambda = 15, seed = 1988, period = 2 * pi, spline_based = FALSE, phase_variation = TRUE )
I |
Number of subjects. Defaults is 50. |
D |
Number of grid points per subject. Default is 100. |
lambda |
Standard deviation for subject-specific amplitudes. |
seed |
Seed for reproducibility. Default is 1988. |
period |
Controls the period of the mean curve |
spline_based |
If FALSE curve is constructed using sine and cosine functions, if TRUE, curve is constructed using B-spline basis. |
phase_variation |
If TRUE, creates phase variation (registered curves are observed on uneven grid). If FALSE, no phase variation. |
A simulated dataframe with variables id, value, index, latent_mean, and t. Index is the domain on which curves are unregistered and t is the domain on which curves are registered.
Julia Wrobel [email protected], Jeff Goldsmith [email protected]
Calculations quadratic form of theta with diagonalized variational parameter in the center.
squareTheta(xi, theta)
squareTheta(xi, theta)
xi |
vector of variational parameters for the current subject. |
theta |
spline basis functions for the current subject. |
A matrix of the quadratic form of theta for the current subject.