Title: | Distributional Data Analysis Techniques for Biosensor Data |
---|---|
Description: | Unified and user-friendly framework for using new distributional representations of biosensors data in different statistical modeling tasks: regression models, hypothesis testing, cluster analysis, visualization, and descriptive analysis. Distributional representations are a functional extension of compositional time-range metrics and we have used them successfully so far in modeling glucose profiles and accelerometer data. However, these functional representations can be used to represent any biosensor data such as ECG or medical imaging such as fMRI. Matabuena M, Petersen A, Vidal JC, Gude F. "Glucodensities: A new representation of glucose profiles using distributional data analysis" (2021) <doi:10.1177/0962280221998064>. |
Authors: | Juan C. Vidal [aut, cre] , Marcos Matabuena [aut] , Marta Karas [ctb] |
Maintainer: | Juan C. Vidal <[email protected]> |
License: | GPL-2 |
Version: | 1.0 |
Built: | 2024-12-03 06:50:33 UTC |
Source: | CRAN |
Biosensor data have the potential to improve disease control and detection. However, the analysis of these data under free-living conditions is not feasible with current statistical techniques. To address this challenge, we introduce a new functional representation of biosensor data, termed the glucodensity, together with a data analysis framework based on distances between them. The new data analysis procedure is illustrated through an application in diabetes with continuous-time glucose monitoring (CGM) data. In this domain, we show marked improvement with respect to state-of-the-art analysis methods. In particular, our findings demonstrate that (i) the glucodensity possesses an extraordinary clinical sensitivity to capture the typical biomarkers used in the standard clinical practice in diabetes; (ii) previous biomarkers cannot accurately predict glucodensity, so that the latter is a richer source of information and; (iii) the glucodensity is a natural generalization of the time in range metric, this being the gold standard in the handling of CGM data. Furthermore, the new method overcomes many of the drawbacks of time in range metrics and provides more indepth insight into assessing glucose metabolism.
Juan C. Vidal [email protected]
Marcos Matabuena [email protected]
Performs energy clustering with Wasserstein distance using quantile distributional representations as covariates.
clustering(data, clusters=3, iter_max=10, restarts=1)
clustering(data, clusters=3, iter_max=10, restarts=1)
data |
A biosensor object. |
clusters |
Number of clusters. |
iter_max |
Maximum number of iterations. |
restarts |
Number of restarts. |
An object of class bclustering:
data
A data frame with biosensor raw data.
result
A kgroups object (see energy library).
# Data extracted from the paper: Hall, H., Perelman, D., Breschi, A., Limcaoco, P., Kellogg, R., # McLaughlin, T., Snyder, M., Glucotypes reveal new patterns of glucose dysregulation, PLoS # biology 16(7), 2018. file1 = system.file("extdata", "data_1.csv", package = "biosensors.usc") file2 = system.file("extdata", "variables_1.csv", package = "biosensors.usc") data = load_data(file1, file2) clus = clustering(data, clusters=3)
# Data extracted from the paper: Hall, H., Perelman, D., Breschi, A., Limcaoco, P., Kellogg, R., # McLaughlin, T., Snyder, M., Glucotypes reveal new patterns of glucose dysregulation, PLoS # biology 16(7), 2018. file1 = system.file("extdata", "data_1.csv", package = "biosensors.usc") file2 = system.file("extdata", "variables_1.csv", package = "biosensors.usc") data = load_data(file1, file2) clus = clustering(data, clusters=3)
Predicts the cluster of each element of the objects list
clustering_prediction(clustering, objects)
clustering_prediction(clustering, objects)
clustering |
A gl.clustering object. |
objects |
Matrix of objects to cluster. |
The clusters to which these objects are assigned.
Generates a quantile regression model V + V2 * v + tau * V2 * Q0 where Q0 is a truncated random variable, v = 2 * X, tau = 2 * X, V ~ Unif(-1, 1), V2 ~ Unif(-1, -1), V3 ~ Unif(0.8, 1.2), and E(V|X) = tau * Q0;
generate_data(n = 100, Qp = 100, Xp = 5)
generate_data(n = 100, Qp = 100, Xp = 5)
n |
Sample size. |
Qp |
Dimension of the quantile. |
Xp |
Dimension of covariates where X_i~Unif(0,1). |
A biosensor object:
data
NULL.
densities
NULL.
quantiles
A functional data object (fdata) with the empirical quantile estimation.
variables
A data frame with Xp covariates.
data = generate_data(n=100, Qp=100, Xp=5) names(data) head(data$quantiles) head(data$variables) plot(data$quantiles, main="Quantile curves")
data = generate_data(n=100, Qp=100, Xp=5) names(data) head(data$quantiles) head(data$variables) plot(data$quantiles, main="Quantile curves")
Hypothesis testing between two random samples of distributional representations to detect differences in scale and localization (ANOVA test) or distributional differences (Energy distance).
hypothesis_testing(data1, data2, permutations=100)
hypothesis_testing(data1, data2, permutations=100)
data1 |
A biosensor object. First population. |
data2 |
A biosensor object. Second population. |
permutations |
Number of permutations used in the energy distance calibration test. |
An object of class biotest:
p1_mean
Quantile mean of the first population.
p1_variance
Quantile variance of the first population.
p2_mean
Quantile mean of the second population.
p2_variance
Quantile variance of the second population.
energy_pvalue
P-value of the energy distance test.
anova_pvalue
P-value of the ANOvA-Fréchet test.
# Data extracted from the paper: Hall, H., Perelman, D., Breschi, A., Limcaoco, P., Kellogg, R., # McLaughlin, T., Snyder, M., Glucotypes reveal new patterns of glucose dysregulation, PLoS # biology 16(7), 2018. file1 = system.file("extdata", "data_1.csv", package = "biosensors.usc") file2 = system.file("extdata", "variables_1.csv", package = "biosensors.usc") data1 = load_data(file1, file2) file3 = system.file("extdata", "data_2.csv", package = "biosensors.usc") file4 = system.file("extdata", "variables_2.csv", package = "biosensors.usc") data2 = load_data(file3, file4) htest = hypothesis_testing(data1, data2)
# Data extracted from the paper: Hall, H., Perelman, D., Breschi, A., Limcaoco, P., Kellogg, R., # McLaughlin, T., Snyder, M., Glucotypes reveal new patterns of glucose dysregulation, PLoS # biology 16(7), 2018. file1 = system.file("extdata", "data_1.csv", package = "biosensors.usc") file2 = system.file("extdata", "variables_1.csv", package = "biosensors.usc") data1 = load_data(file1, file2) file3 = system.file("extdata", "data_2.csv", package = "biosensors.usc") file4 = system.file("extdata", "variables_2.csv", package = "biosensors.usc") data2 = load_data(file3, file4) htest = hypothesis_testing(data1, data2)
R function to read biosensors data from a csv files.
load_data(filename_fdata, filename_variables = NULL)
load_data(filename_fdata, filename_variables = NULL)
filename_fdata |
A csv file with the functional data. The csv file must have long format with, at least, the following three columns: id, time, and value, where the id identifies the individual, the time indicates the moment in which the data was captured, and the value is a monitor measure. |
filename_variables |
A csv file with the clinical variables. The csv file contains a row per individual and must have a column id identifying this individual. |
A biosensor object:
data
A data frame with biosensor raw data.
densities
A functional data object (fdata) with a non-parametric density estimation.
quantiles
A functional data object (fdata) with the empirical quantile estimation.
variables
A data frame with the covariates.
# Data extracted from the paper: Hall, H., Perelman, D., Breschi, A., Limcaoco, P., Kellogg, R., # McLaughlin, T., Snyder, M., Glucotypes reveal new patterns of glucose dysregulation, PLoS # biology 16(7), 2018. file1 = system.file("extdata", "data_1.csv", package = "biosensors.usc") file2 = system.file("extdata", "variables_1.csv", package = "biosensors.usc") data = load_data(file1, file2) names(data) head(data$quantiles) head(data$variables) plot(data$quantiles, main="Quantile curves")
# Data extracted from the paper: Hall, H., Perelman, D., Breschi, A., Limcaoco, P., Kellogg, R., # McLaughlin, T., Snyder, M., Glucotypes reveal new patterns of glucose dysregulation, PLoS # biology 16(7), 2018. file1 = system.file("extdata", "data_1.csv", package = "biosensors.usc") file2 = system.file("extdata", "variables_1.csv", package = "biosensors.usc") data = load_data(file1, file2) names(data) head(data$quantiles) head(data$variables) plot(data$quantiles, main="Quantile curves")
Functional non-parametric Nadaraya-Watson prediction with 2-Wasserstein distance.
nadayara_prediction(nadaraya, Qpred, hs=NULL)
nadayara_prediction(nadaraya, Qpred, hs=NULL)
nadaraya |
A Nadaraya regression object. |
Qpred |
Quantile curves that will be used in the predictions |
hs |
Smoothing parameters for the predictions, by default hs = seq(0.8, 15, length = 200) |
An object of class bnadarayapred:
prediction
The Nadaraya-Watson prediction for the test data at each value of hs.
hs
Hs values used for the prediction.
# Data extracted from the paper: Hall, H., Perelman, D., Breschi, A., Limcaoco, P., Kellogg, R., # McLaughlin, T., Snyder, M., Glucotypes reveal new patterns of glucose dysregulation, PLoS # biology 16(7), 2018. file1 = system.file("extdata", "data_1.csv", package = "biosensors.usc") file2 = system.file("extdata", "variables_1.csv", package = "biosensors.usc") data = load_data(file1, file2) nada = nadayara_regression(data, "BMI") # Example of prediction with the column mean of quantiles npre = nadayara_prediction(nada, t(colMeans(data$quantiles$data)))
# Data extracted from the paper: Hall, H., Perelman, D., Breschi, A., Limcaoco, P., Kellogg, R., # McLaughlin, T., Snyder, M., Glucotypes reveal new patterns of glucose dysregulation, PLoS # biology 16(7), 2018. file1 = system.file("extdata", "data_1.csv", package = "biosensors.usc") file2 = system.file("extdata", "variables_1.csv", package = "biosensors.usc") data = load_data(file1, file2) nada = nadayara_regression(data, "BMI") # Example of prediction with the column mean of quantiles npre = nadayara_prediction(nada, t(colMeans(data$quantiles$data)))
Functional non-parametric Nadaraya-Watson regression with 2-Wasserstein distance, using as predictor the distributional representation and as response a scalar outcome.
nadayara_regression(data, response)
nadayara_regression(data, response)
data |
A biosensor object. |
response |
The name of the scalar response. The response must be a column name in data$variables. |
An object of class bnadaraya:
prediction
The Nadaraya-Watson prediction for each point of the training data at each h=seq(0.8, 15, length=200).
r2
R2 estimation for the training data at each h=seq(0.8, 15, length=200).
error
Standard mean-squared error after applying leave-one-out cross-validation for the training data at each h=seq(0.8, 15, length=200).
data
A data frame with biosensor raw data.
response
The name of the scalar response.
# Data extracted from the paper: Hall, H., Perelman, D., Breschi, A., Limcaoco, P., Kellogg, R., # McLaughlin, T., Snyder, M., Glucotypes reveal new patterns of glucose dysregulation, PLoS # biology 16(7), 2018. file1 = system.file("extdata", "data_1.csv", package = "biosensors.usc") file2 = system.file("extdata", "variables_1.csv", package = "biosensors.usc") data = load_data(file1, file2) nada = nadayara_regression(data, "BMI")
# Data extracted from the paper: Hall, H., Perelman, D., Breschi, A., Limcaoco, P., Kellogg, R., # McLaughlin, T., Snyder, M., Glucotypes reveal new patterns of glucose dysregulation, PLoS # biology 16(7), 2018. file1 = system.file("extdata", "data_1.csv", package = "biosensors.usc") file2 = system.file("extdata", "variables_1.csv", package = "biosensors.usc") data = load_data(file1, file2) nada = nadayara_regression(data, "BMI")
Performs the Wasserstein regression using quantile functions.
regmod_prediction(data, xpred)
regmod_prediction(data, xpred)
data |
A bregmod object. |
xpred |
A kxp matrix of input values for regressors for prediction, where k is the number of points we do the prediction and p is the dimension of the input variables. |
A kxm array. Qpred(l, :) is the regression prediction of Q given X = xpred(l, :)' where m is the dimension of the grid of quantile function.
# Data extracted from the paper: Hall, H., Perelman, D., Breschi, A., Limcaoco, P., Kellogg, R., # McLaughlin, T., Snyder, M., Glucotypes reveal new patterns of glucose dysregulation, PLoS # biology 16(7), 2018. file1 = system.file("extdata", "data_1.csv", package = "biosensors.usc") file2 = system.file("extdata", "variables_1.csv", package = "biosensors.usc") data = load_data(file1, file2) regm = regmod_regression(data, "BMI") # Example of prediction xpred = as.matrix(25) g1rmp = regmod_prediction(regm, xpred)
# Data extracted from the paper: Hall, H., Perelman, D., Breschi, A., Limcaoco, P., Kellogg, R., # McLaughlin, T., Snyder, M., Glucotypes reveal new patterns of glucose dysregulation, PLoS # biology 16(7), 2018. file1 = system.file("extdata", "data_1.csv", package = "biosensors.usc") file2 = system.file("extdata", "variables_1.csv", package = "biosensors.usc") data = load_data(file1, file2) regm = regmod_regression(data, "BMI") # Example of prediction xpred = as.matrix(25) g1rmp = regmod_prediction(regm, xpred)
Performs the Wasserstein regression using quantile functions.
regmod_regression(data, response)
regmod_regression(data, response)
data |
A biosensor object. |
response |
The name of the scalar response. The response must be a column name in data$variables. |
An object of class bregmod containing the components:
beta
The beta coefficient functions of the fitting.
prediction
The prediction for each training data.
residuals
The residuals for each prediction value.
# Data extracted from the paper: Hall, H., Perelman, D., Breschi, A., Limcaoco, P., Kellogg, R., # McLaughlin, T., Snyder, M., Glucotypes reveal new patterns of glucose dysregulation, PLoS # biology 16(7), 2018. file1 = system.file("extdata", "data_1.csv", package = "biosensors.usc") file2 = system.file("extdata", "variables_1.csv", package = "biosensors.usc") data = load_data(file1, file2) regm = regmod_regression(data, "BMI")
# Data extracted from the paper: Hall, H., Perelman, D., Breschi, A., Limcaoco, P., Kellogg, R., # McLaughlin, T., Snyder, M., Glucotypes reveal new patterns of glucose dysregulation, PLoS # biology 16(7), 2018. file1 = system.file("extdata", "data_1.csv", package = "biosensors.usc") file2 = system.file("extdata", "variables_1.csv", package = "biosensors.usc") data = load_data(file1, file2) regm = regmod_regression(data, "BMI")
Performs a Ridge regression.
ridge_regression(data, response, w=NULL, method="manhattan", type="gaussian")
ridge_regression(data, response, w=NULL, method="manhattan", type="gaussian")
data |
A biosensor object. |
response |
The name of the scalar response. The response must be a column name in data$variables. |
w |
A weight function. |
method |
The distance measure to be used (@seealso parallelDist::parDist). By default manhattan distance. |
type |
The kernel type ("gaussian" or "lapla"). By default gaussian distance. |
An object containing the components:
best_alphas
Best coefficients obtained with leave-one-out cross-validation criteria.
best_kernel
The kernel matrix of the best solution.
best_sigma
The sigma parameter of the best solution.
best_lambda
The lambda parameter of the best solution.
sigmas
The sigma parameters used in the fitting according to the median heuristic fitting criteria.
predictions
A matrix of predictions.
r2
R-square of the different models fitted.
error
Mean squared-error of the different models fitted.
predictions_cross
A matrix of predictions obtained with leave-one-out cross-validation criteria.
# Data extracted from the paper: Hall, H., Perelman, D., Breschi, A., Limcaoco, P., Kellogg, R., # McLaughlin, T., Snyder, M., Glucotypes reveal new patterns of glucose dysregulation, PLoS # biology 16(7), 2018. file1 = system.file("extdata", "data_1.csv", package = "biosensors.usc") file2 = system.file("extdata", "variables_1.csv", package = "biosensors.usc") data = load_data(file1, file2) regm = ridge_regression(data, "BMI")
# Data extracted from the paper: Hall, H., Perelman, D., Breschi, A., Limcaoco, P., Kellogg, R., # McLaughlin, T., Snyder, M., Glucotypes reveal new patterns of glucose dysregulation, PLoS # biology 16(7), 2018. file1 = system.file("extdata", "data_1.csv", package = "biosensors.usc") file2 = system.file("extdata", "variables_1.csv", package = "biosensors.usc") data = load_data(file1, file2) regm = ridge_regression(data, "BMI")