--- title: "MariNET - A Novel Framework for Inferring Dynamic Network Relationships from Longitudinal EHRs Using Linear Mixed Models" author: - name: "Marina Vargas-Fernández" affiliation: - Department of Statistics and Operational Research. University of Granada - GENYO, Centre for Genomics and Oncological Research email: marina.vargas@genyo.es - name: "Jordi Martorell-Marugán" affiliation: - GENYO, Centre for Genomics and Oncological Research - Andalusian Foundation for Biomedical Research in Eastern Andalusia (FIBAO) email: jordi.martorell@genyo.es - name: "Pedro Carmona-Sáez" affiliation: - Department of Statistics and Operational Research. University of Granada - GENYO, Centre for Genomics and Oncological Research email: pcarmona@ugr.es package: MariNET date: "`r BiocStyle::doc_date()`" abstract: > The rapid digitization of healthcare data, particularly through electronic health records (EHRs), has created unprecedented opportunities for biomedical research. EHRs contain rich, heterogeneous, and longitudinal data that, when analyzed at a systems level, can reveal complex patterns underlying disease progression, comorbidities, and patient trajectories. However, the high-dimensional and interdependent nature of these data poses significant analytical challenges, particularly when accounting for temporal dependencies and hierarchical structures inherent in longitudinal studies. Traditional methods, such as Gaussian Graphical Modeling and Vector Autoregression, often fall short in addressing these complexities due to strict assumptions of independence and stationarity, limiting their applicability to real-world EHR data. To overcome these limitations, we introduce MariNET, a novel methodology that leverages linear mixed models (LMM) to infer network relationships from longitudinal EHR data. By incorporating weights derived from LMMs, our method effectively handles correlated observations and provides a robust framework for analyzing dynamic interactions among clinical variables over time. This approach not only enhances the understanding of temporal dependencies in healthcare data but also offers a scalable and practical solution for uncovering clinically relevant insights. output: BiocStyle::html_document: toc: true toc_depth: 2 toc_float: true number_sections: true css: styles.css bibliography: references.bib # Link the bibliography file vignette: > %\VignetteIndexEntry{MariNET - A Novel Framework for Inferring Dynamic Network Relationships from Longitudinal EHRs Using Linear Mixed Models} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` # Introduction to MariNET The `MariNET` package provides tools for analyzing longitudinal clinical data using linear mixed models (LMM) and visualizing the results as networks. This vignette demonstrates how to use the package to perform longitudinal analysis and generate network plots. The purpose of this vignette is to showcase the functionality of the package, including: - Fitting linear mixed models to clinical data - Visualizing the results as networks - Comparing different network structures # Installation You can install `MariNET` package from CRAN using: ```{r setup} #install.packages("MariNET") library("MariNET") ``` # Loading Data In this section, we will load the dataset included in the package. Sample data was obtained from previous Assesment study about relationships between COVID-19 and clinical variables related to mental health and social contact [@fried2022mental]. ```{r load-data} # Load the dataset from the package data(example_data) # Display the first few rows head(example_data) ``` # Linear Mixed effects Model network The present package is focused on the use of linear mixed models in the field of network construction. It should be noted that the described methodology could be applied to different fields of information, as the origin of the data itself makes no difference in the method's applicability [@lme42015]. For network construction, a separate linear mixed model is created for each clinical variable, including the others as dependent ones. This process was repeated iteratively for each variable, as performed on previous studies [@van2018network]. ```{r} # Extract column names from the dataset # These represent all available variables in the dataset varLabs <- colnames(example_data) # Define a list of variables to be removed from the analysis # These variables are not included as nodes in the network visualization remove <- c("id", "day", "beep", "conc") # Filter out the unwanted variables # Keeps only the variables that are not in the "remove" list varLabs <- varLabs[!varLabs %in% remove] # Print the final list of selected variables to be used as nodes in the network print(varLabs) ``` The function *lmm_analysis()* is the main tool of this package. It requires input data with the following conditions: - *clinical_data*: Dataframe containing clinical and metadata for participants, including identifier as *participant_id*. Make sure this is the first column of the dataframe. - *variables_to_scale*: Character vector of variable names to be analyzed, must be numerical as they are scaled. - *random_effects*: A character string specifying the random effects formula (default: "(1 | participant_id)"). ```{r} # Perform Linear Mixed Model (LMM) analysis on the dataset # This function iterates over selected variables (varLabs) and models their relationships # while accounting for individual-level variability using a random effect. model <- lmm_analysis( example_data, # Input dataset containing clinical/longitudinal data varLabs, # List of selected variables to be analyzed in the model random_effects = "(1|id)" # Specifies a random intercept for each individual (id) ) # Print the model results (optional, useful for debugging or reviewing output) # print(model) ``` # Network visualization In order to visualize the plot according to grouping factors, it is important to add a structure to the data. This means grouping or selecting colors to differentiate between correlated symptoms. Visualization is based on qgraph package [@R-qgraph]. ```{r qgraph-plot2, fig.width=3, fig.height=2, dpi=300} # Define the community structure for the variables # Assigns labels to different groups based on symptoms or categories community_structure <- c( rep("Stress", 8), # First 8 variables belong to the "Stress" group rep("Social", 6), # Next 6 variables belong to the "Social" group rep("Covid-19", 4) # Last 4 variables belong to the "Covid-19" group ) # Create a dataframe linking variable names to their assigned community group structure <- data.frame(varLabs, community_structure) # Define labels for the network plot (using variable names) labels <- varLabs # Load the qgraph package for network visualization library(qgraph) # Generate the network plot using qgraph qgraph( model, # Adjacency matrix or network model input groups = structure$community_structure, # Assign colors based on community groups labels = labels, # Display variable names as node labels legend = TRUE, # Include a legend in the plot layout = "spring", # Use a force-directed "spring" layout for better visualization color = c("orange", "lightblue", "#008080"), # Define colors for different groups legend.cex = 0.3 # Adjust the size of the legend text ) ``` # Comparison between models As the weighted matrix is built based on t-values, it is not contained between -1 and 1 values. This means that it is not comparable with usual network modeling methods, which rely on correlation and pairwise estimation. For comparability purposes, normalization is performed on the adjacency matrix, scaling values by range. Then, normalized weighted matrices are subtracted to see differences. ```{r qgraph-plot3, fig.width=3, fig.height=2, dpi=300} # Fit a second Linear Mixed Model (LMM) with a more complex random effects structure # This model accounts for repeated measures within individuals (id) over different days (day) # and also considers an additional random effect for the variable "conc" (context or condition) model2 <- lmm_analysis( example_data, # Input dataset containing clinical/longitudinal data varLabs, # List of selected variables to be analyzed in the model random_effects = "(1|id/day) + (1|conc)" # Random effects structure: # (1|id/day) -> Nested random effect for each day within an individual # (1|conc) -> Additional random effect for "conc" variable ) # Generate a network visualization from the second LMM model qgraph( model2, # Adjacency matrix or network model derived from LMM groups = structure$community_structure, # Assign colors based on predefined symptom groups labels = labels, # Display variable names as node labels legend = TRUE, # Include a legend in the plot layout = "spring", # Use a force-directed "spring" layout for better visualization color = c("orange", "lightblue", "#008080"), # Define colors for different variable groups legend.cex = 0.3 # Adjust the legend text size to avoid oversized labels ) ``` Subtraction is performed between adjacency matrices. Normalization between -1 and 1 is performed inside *differentiation()* function. This function requires two adjacency matrices as an input, both of them should have the same dimensions and node names. ```{r qgraph-plot4, fig.width=3, fig.height=2, dpi=300} # Compute the difference between the two Linear Mixed Model (LMM) networks # This highlights changes in relationships when considering different random effect structures difference <- differentiation(model, model2) # Generate a network visualization of the differences between the two models qgraph( difference, # Adjacency matrix representing differences between model1 and model2 groups = structure$community_structure, # Assign colors based on predefined variable groups labels = labels, # Display variable names as node labels legend = TRUE, # Include a legend in the plot layout = "spring", # Use a force-directed "spring" layout for better visualization color = c("orange", "lightblue", "#008080"), # Define colors for different variable groups legend.cex = 0.3 # Adjust legend text size to keep it readable ) ``` # Additional information To check your R session information, including loaded packages, R version, and system details. ```{r} sessionInfo() ``` # References ```{r, echo=FALSE, results="asis"}