This script is a modified version of Minica & Neale 2018 AGES workshop In this script we will test the (bidirectional) causal effect between two phenotypes, think of BMI on SBP (systolic blood pressure) and vice-versa, and polygenic scores for each. In the interest of brevity we will always refer to BMI as variable X, and SBP as variable Y, with the respective instruments iX and iY.
There are four main take away messages that you should focus: 1. How to setup a simulation in OpenMX using mxGenerateData 2. That a two-stages least squares test can be specified in SEM in OpenMX 3. That MR-DoC recovers the same estimates from 2sls test 4. That in the presence of horizontal pleiotropy estimates are biased in the 2sls test and unbiased in MR-DoC
The script contains four scenarios, - no pleiotropy between the instrument and the outcome (Scenario 1), - where there is pleiotropy (Scenario 2), - presence of a bidirectional causal effect (Scenario 3) - absence of a causal effect (Scenario 4) See presentation slides for path diagrams. If you are using RStudio you can navigate using the outline dropdown menu Setting the stage ————————————————————
rm(list=ls()) # Emptying the R environment, not recommended usually but useful
# for this workshop
# loading required packages
library(OpenMx)
library(MASS)
library(dplyr)
options(digits = 2, scipen = 999) # we dont want scientific notation
mxOption(NULL, "Default optimizer", "SLSQP")
The models are specified in three objects, top (for common parts), MZ and DZ spend some time recognizing the elements in the model, by now you should have seen similar code. Notice two minor style changes, I am naming the objects at the beginning and the matrices labels spelled out in the positions they will end up in the matrix. The final objects (mrdoc1, mrdoc2) will be reused throughout the script. The matrix containing regressions for causal paths and instruments
BE <- mxMatrix(name = "BE", type = "Full",nrow=3, ncol = 3, byrow = TRUE,
labels = c(NA, "g2", "b1",
"g1", NA, "b2",
NA, NA, NA),
free = c(FALSE, FALSE, TRUE,
TRUE, FALSE, TRUE,
FALSE, FALSE, FALSE),
dimnames = list(c("X", "Y", "iX"),
c("X", "Y", "iX")))
ACE decomposition
A <- mxMatrix(name = 'A', type='Symm', nrow=3,ncol = 3,byrow = TRUE,
labels=c("ax2", "covA", NA,
"covA","ay2", NA,
NA, NA, "sigma_x"),
free=c(TRUE, TRUE, FALSE,
TRUE, TRUE, FALSE,
FALSE,FALSE,TRUE),
dimnames = list(c("X", "Y", "iX"),
c("X", "Y", "iX")))
C <- mxMatrix(name = 'C', type='Symm',nrow=3, ncol = 3,byrow = TRUE,
labels =c("cx2", "covC", NA,
"covC","cy2", NA,
NA, NA, NA),
free=c(TRUE, TRUE, FALSE,
TRUE, TRUE, FALSE,
FALSE,FALSE,FALSE),
dimnames = list(c("X", "Y", "iX"),
c("X", "Y", "iX")))
E <- mxMatrix(name = 'E', type='Symm', nrow=3, ncol = 3,byrow = TRUE,
labels =c("ex2", "covE",NA,
"covE","ey2", NA,
NA, NA, NA),
free= c(TRUE, FALSE,FALSE,
FALSE,TRUE, FALSE,
FALSE,FALSE,FALSE),
dimnames = list(c("X", "Y", "iX"),
c("X", "Y", "iX")))
The filter matrix is used to remove the PRS from _T2 in MZs, if we kept it the matrix would become redundant resulting in the Hessian not positive
filter <- mxMatrix(name = 'filter', type='Full', nrow=5, ncol=6, free=FALSE,
byrow = TRUE,
values=c(1,0,0,0,0,0,
0,1,0,0,0,0,
0,0,1,0,0,0,
0,0,0,1,0,0,
0,0,0,0,1,0),
dimnames = list(c("X_T1", "Y_T1", "iX_T1","X_T2", "Y_T2"),
c("X_T1", "Y_T1", "iX_T1","X_T2", "Y_T2", "iX_T2")))
This lambda (LY) matrix is fixing total variances to 1 for PRSs and phenotypes
LY <- mxMatrix(name = 'LY', type='Full',nrow=3, ncol = 3, free = FALSE,
values = diag(3), labels = NA,
dimnames = list(c("X", "Y", "iX"),
c("X", "Y", "iX")))
Means objects
mean_dz <- mxMatrix(name = 'mean_dz', type='Full', nrow=1, ncol=6,
free=TRUE, values= 0, byrow = TRUE,
labels=c('mX1','mY2','miX1','mX1','mY2','miX1'))
mean_mz <- mxAlgebra('mean_mz', expression = mean_dz%*%t(filter))
Identity matrix for the algebras
I <- mxMatrix(name = 'I', type='Iden', nrow= 3,ncol= 3)
algebras <- list(
mxAlgebra('A_' , expression = LY %&% solve(I - BE)%&%A),
mxAlgebra('C_' , expression = LY %&% solve(I - BE)%&%C),
mxAlgebra('E_' , expression = LY %&% solve(I - BE)%&%E),
mxAlgebra('SPh' , expression = A_ + C_ + E_),
mxAlgebra('variance_mz_', expression = rbind(
cbind(SPh, A_+C_),
cbind(A_+C_, SPh))),
mxAlgebra('variance_dz', expression= rbind(
cbind(SPh, .5%x%A_+C_),
cbind(.5%x%A_+C_, SPh))),
mxAlgebra('variance_mz', expression= filter%&%variance_mz_))
top_mr1 <- mxModel("top", BE, A, C, E, filter, LY, mean_dz, mean_mz, I, algebras)
Preparing the objects for the multiple groups (MZ, DZ) analysis
MZ_mr1 = mxModel("MZ",mxFitFunctionML(),
mxExpectationNormal(covariance = "top.variance_mz", means = "top.mean_mz",
dimnames = c("X_T1", "Y_T1", "iX_T1",
"X_T2", "Y_T2")))
DZ_mr1 = mxModel("DZ", mxFitFunctionML(),
mxExpectationNormal(covariance = "top.variance_dz", means = "top.mean_dz",
dimnames = c("X_T1", "Y_T1", "iX_T1",
"X_T2", "Y_T2", "iX_T2")))
Combining objects to generate the final model
Scenario 1: no pleiotropy ————————————————— Generate simulated data for mrdoc1 Fit the model in unrelateds using a structural equation model Fit the model in twins - using the MR-doc Compare the NCPs of the models in unrelateds and twins Look at the 2 stage least squares results Let’s generate simulated data where b2 (the pleiotropic path) is zero, in other words, no pleiotropy present in the data. As such we need to set the true values, this is a way of doing this:
true_model <- mrdoc1 |>
omxSetParameters(labels = c("g1","b1", "b2",
"ax2", "ay2", "cx2", "cy2", "ex2", "ey2",
"covA", "covC", "covE","sigma_x"),
values = c(0.316, 0.316, 0,
0.424, 0.671, 0.671, 0.424, 0.519, 0.519,
0.411,0.221,0, 1)) |>
omxSetParameters(labels = c("b2"), free = FALSE) # remember, no pleiotropy
Simulating data according to the exact covariance matrix from mrdoc1 (above), this is done using the switch empirical = T, this should be quick
The data does not comes with the instrument column for the second twin, we have to duplicate the column. In your analysis, this step will not happen, we are doing this because of how I coded the simulation for this workshop.
sim_data$MZ <- mutate(sim_data$MZ, iX_T2 = iX_T1)
dnpmz <- sim_data$MZ
dnpdz <- sim_data$DZ
dim(dnpdz) #
## [1] 1000 6
## X_T1 Y_T1 iX_T1 X_T2 Y_T2 iX_T2
## 1 1.17 1.63 -0.43 1.469 1.88 -0.28
## 2 -0.79 -0.79 0.39 -2.307 -2.29 -1.81
## 3 0.18 1.21 0.65 0.286 1.38 0.04
## 4 -1.74 -0.73 0.64 -0.908 -1.79 0.79
## 5 0.96 1.04 -0.36 -1.348 0.06 -1.50
## 6 -0.85 -0.56 -0.47 -0.074 -0.40 -1.21
## X_T1 Y_T1 iX_T1 X_T2 Y_T2 iX_T2
## Min. :-5.0 Min. :-4.4 Min. :-3.4 Min. :-4.9 Min. :-4.8 Min. :-3.05
## 1st Qu.:-0.9 1st Qu.:-1.0 1st Qu.:-0.7 1st Qu.:-0.9 1st Qu.:-1.0 1st Qu.:-0.64
## Median :-0.1 Median : 0.0 Median : 0.0 Median : 0.0 Median : 0.0 Median : 0.07
## Mean : 0.0 Mean : 0.0 Mean : 0.0 Mean : 0.0 Mean : 0.0 Mean : 0.00
## 3rd Qu.: 0.8 3rd Qu.: 1.0 3rd Qu.: 0.7 3rd Qu.: 0.9 3rd Qu.: 1.0 3rd Qu.: 0.64
## Max. : 4.5 Max. : 5.4 Max. : 3.9 Max. : 4.6 Max. : 4.2 Max. : 2.83
## [1] 1000 6
## X_T1 Y_T1 iX_T1 X_T2 Y_T2 iX_T2
## 1 -0.064 1.70 -1.32 -0.39 1.160 -1.32
## 2 -1.285 -1.36 -0.60 -1.43 0.021 -0.60
## 3 0.299 0.44 -1.39 1.17 0.449 -1.39
## 4 -0.733 -2.93 -0.94 -0.61 -2.126 -0.94
## 5 1.016 -0.11 0.30 1.44 0.035 0.30
## 6 -1.071 0.18 -1.67 -1.21 -0.562 -1.67
## X_T1 Y_T1 iX_T1 X_T2 Y_T2 iX_T2
## Min. :-4.2 Min. :-5.5 Min. :-3.4 Min. :-5.4 Min. :-6.7 Min. :-3.4
## 1st Qu.:-0.9 1st Qu.:-1.0 1st Qu.:-0.7 1st Qu.:-0.9 1st Qu.:-1.0 1st Qu.:-0.7
## Median : 0.0 Median : 0.0 Median : 0.0 Median : 0.0 Median : 0.0 Median : 0.0
## Mean : 0.0 Mean : 0.0 Mean : 0.0 Mean : 0.0 Mean : 0.0 Mean : 0.0
## 3rd Qu.: 0.9 3rd Qu.: 1.0 3rd Qu.: 0.7 3rd Qu.: 0.8 3rd Qu.: 1.0 3rd Qu.: 0.7
## Max. : 3.8 Max. : 4.3 Max. : 2.9 Max. : 4.3 Max. : 4.7 Max. : 2.9
We now add the data to the model object created before. Notice that
the plus sign is overloaded in OpenMx as it is, for example, in ggplot2.
So you can combine objects with the syntax below instead of
model = mxModel(model, mxData(data, type = "raw"))
This
syntax is optional, but helps reducing the lenght of the script
mrdoc1$MZ <- mrdoc1$MZ + mxData(dnpmz, type = "raw")
mrdoc1$DZ <- mrdoc1$DZ + mxData(dnpdz, type = "raw")
m1 <- mrdoc1 |>
# Remember, no b2 in data, no b2 in the model
omxSetParameters(labels = "b2", free = FALSE) |>
# The poing of examining the data a few lines above is to set sensible
# starting values for the model. Here, in the interest of brevity, we ask
# OpenMx to pick starting values for us.
mxAutoStart()
m1 <- mxRun(m1)
## Running mrdoc1 with 14 parameters
# In your analyses, make sure to test for local identification frequently:
# mxCheckIdentification(m1)$status
# Model is locally identified
# [1] TRUE
summary(m1)
## Summary of mrdoc1
##
## free parameters:
## name matrix row col Estimate Std.Error A
## 1 g1 top.BE Y X 0.316000000123 0.028
## 2 b1 top.BE X iX 0.316000000040 0.021
## 3 ax2 top.A X X 0.423575999118 0.077
## 4 covA top.A X Y 0.410588999655 0.064
## 5 ay2 top.A Y Y 0.670328999037 0.089
## 6 sigma_x top.A iX iX 0.999000000056 0.026
## 7 cx2 top.C X X 0.670329000971 0.074
## 8 covC top.C X Y 0.220779000217 0.054
## 9 cy2 top.C Y Y 0.423576000776 0.080
## 10 ex2 top.E X X 0.518481000185 0.023
## 11 ey2 top.E Y Y 0.518481000190 0.023
## 12 mX1 top.mean_dz 1 X_T1 -0.000000000056 0.026
## 13 mY2 top.mean_dz 1 Y_T1 -0.000000000034 0.030
## 14 miX1 top.mean_dz 1 iX_T1 -0.000000000032 0.021
##
## Model Statistics:
## | Parameters | Degrees of Freedom | Fit (-2lnL units)
## Model: 14 10986 32383
## Saturated: NA NA NA
## Independence: NA NA NA
## Number of observations/statistics: 2000/11000
##
## Information Criteria:
## | df Penalty | Parameters Penalty | Sample-Size Adjusted
## AIC: 10411 32411 32411
## BIC: -51120 32490 32445
## CFI: NA
## TLI: 1 (also known as NNFI)
## RMSEA: 0 [95% CI (NA, NA)]
## Prob(RMSEA <= 0.05): NA
## To get additional fit indices, see help(mxRefModels)
## timestamp: 2024-03-27 14:48:49
## Wall clock time: 0.17 secs
## optimizer: SLSQP
## OpenMx version number: 2.20
## Need help? See help(mxSummary)
One way of assessing whether the causal path is significant is by dropping it
## Running nog1 with 13 parameters
## base comparison ep minus2LL df AIC diffLL diffdf p
## 1 mrdoc1 <NA> 14 32383 10986 32411 NA NA NA
## 2 mrdoc1 nog1 13 32499 10987 32525 115 1 0.0000000000000000000000000066
Q1: We dropped the causal path, what is the interpretation for the above result? ANSWER: The causal path is significant, model m2 was worse. Can we specify a model for a 2-stage least squares test using SEM and OpenMX?
IVModel = mxModel("MR", type = "RAM", manifestVars = c("X", "Y", "I"),
latentVars = c("eX", "eY"),
# Path from instrument to exposure
mxPath(from = "I" , to = "X", arrows = 1, label = "b1"),
# Path from exposure to outcome, g1
mxPath(from = "X", to = "Y", label = "g1"),
# Latent error+ setting up variance and means for variables
mxPath(from = c("I"), arrows = 2, label = "vI"),
mxPath(from = c("eX", "eY"), to = c("X","Y"), value = 1, free = FALSE),
# Variance of residual errors
mxPath(from = c("eX", "eY"), arrows = 2, free = TRUE,
labels = c("vX", "vY")),
mxPath(connect = "unique.bivariate", from = c("eX", "eY"), arrows = 2,
values = 0.2, labels = "re"), # Correlation among residuals
mxPath("one", to = c("X","Y", "I"), labels = c("mX", "mY", "mI")))
# The above specification closely follows the slides in the presentation
# If you have umx installed you can look at it:
# library(umx)
# plot(IVModel)
Let’s generate a new data set by taking only twin 1 from mzs and dzs
dat1 <- rbind(dnpmz[,1:3],dnpdz[,1:3]) |>
# we need to rename the variables to match the dimension names set in the
# IVmodel above
rename(X = X_T1,
Y = Y_T1,
I = iX_T1)
Adding the data to the model
Know your data, check variable skewness, kurtosis and means, but in the interest of brevity, let’s autostart
## Running MR with 9 parameters
## Summary of MR
##
## free parameters:
## name matrix row col Estimate Std.Error A
## 1 g1 A Y X 0.3159990 0.090
## 2 b1 A X I 0.3159993 0.028
## 3 vI S I I 0.9990026 0.032
## 4 vX S eX eX 1.6123920 0.051 !
## 5 re S eX eY 0.6313694 0.150
## 6 vY S eY eY 1.6123882 0.124
## 7 mX M 1 X 0.0000090 0.028 !
## 8 mY M 1 Y -0.0000113 0.028 !
## 9 mI M 1 I 0.0000033 0.022 !
##
## Model Statistics:
## | Parameters | Degrees of Freedom | Fit (-2lnL units)
## Model: 9 5991 18603
## Saturated: 9 5991 NA
## Independence: 6 5994 NA
## Number of observations/statistics: 2000/6000
##
## Information Criteria:
## | df Penalty | Parameters Penalty | Sample-Size Adjusted
## AIC: 6621 18621 18621
## BIC: -26934 18672 18643
## CFI: NA
## TLI: 1 (also known as NNFI)
## RMSEA: 0 [95% CI (NA, NA)]
## Prob(RMSEA <= 0.05): NA
## To get additional fit indices, see help(mxRefModels)
## timestamp: 2024-03-27 14:48:50
## Wall clock time: 0.038 secs
## optimizer: SLSQP
## OpenMx version number: 2.20
## Need help? See help(mxSummary)
unrel2 <- omxSetParameters(name = "no_causal", unrel, label="g1", free = FALSE,
values = 0)
unrel2 <- mxRun(unrel2)
## Running no_causal with 8 parameters
## base comparison ep minus2LL df AIC diffLL diffdf p
## 1 MR <NA> 9 18603 5991 18621 NA NA NA
## 2 MR no_causal 8 18612 5992 18628 9.1 1 0.0025
Q2: The line above is a Likelihood ratio test with 1 degree of freedom, what is the meaning of a significant p-value in this case? ANSWER: The causal path is significant, dropping it made the model (no_causal) significantly worse
# Now let's compare with a typical 2sls test
TSLS1=lm(X~I,data=dat1)
Xhat=predict(TSLS1)
TSLS2=lm(Y~Xhat,data=dat1)
summary(TSLS2)
##
## Call:
## lm(formula = Y ~ Xhat, data = dat1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.477 -1.017 0.004 1.018 5.287
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.00000000000000000429 0.03297416855661412793 0.00 1.0000
## Xhat 0.31599999999999939249 0.10440084815354150338 3.03 0.0025 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.5 on 1998 degrees of freedom
## Multiple R-squared: 0.00456, Adjusted R-squared: 0.00407
## F-statistic: 9.16 on 1 and 1998 DF, p-value: 0.0025
# In the specialized ivreg package the syntax would be:
# library(ivreg)
# TSLS2=ivreg(Y~X|I,data=dat1)
Q3: Check out the estimate for Xhat in the previous summary and compare to unrel estimate. Did MR-DoC estimated same values as 2sls? ANSWER: Yes, the estimates are equal (0.316) Now let’s check the power to reject the hypothesis of g1=0 using the non-centrality parameter for related and unrelated individuals.
lambdam1=mxCompare(m1,m2)[2,7]
dfs=mxCompare(m1,m2)[2,8]
alpha=0.05
ca=qchisq(alpha,dfs,ncp=0,lower.tail=F)
powerm1=pchisq(ca,dfs,ncp=lambdam1,lower.tail=F)
powerm1
## [1] 1
lambdaunrel=mxCompare(unrel,unrel2)[2,7]
dfs=mxCompare(unrel,unrel2)[2,8]
alpha=0.05 # user specified: type I error prob.
ca=qchisq(alpha,dfs,ncp=0,lower.tail=F)
powerUnrel=pchisq(ca,dfs,ncp=lambdaunrel,lower.tail=F)
powerUnrel
## [1] 0.86
Q4: Which method had better power to detect the causal effect? ANSWER: MR-DoC has a power of 0.99, 2sls has a power of 0.86 Scenario 2: Pleiotropy ————————————————- Next consider the scenario with pleiotropy, assume re=0 and test g1 = 0 Check the results: does MR-DoC model recover correctly the parameters b2, g1, and b1. Do we detect a causal effect if we don’t account for pleiotropy (SEM, 2stage least squares, 2-sample MR)?
sim_data2 <- mrdoc1 |>
omxSetParameters(labels = c("g1","b1", "b2",
"ax2", "ay2", "cx2", "cy2", "ex2", "ey2",
"covA", "covC", "covE","sigma_x"),
values = c(0.143, 0.316, 0.127,
0.424, 0.67, 0.670, 0.424, 0.519, 0.519,
0.411,0.221,0, 1)) |>
mxGenerateData( nrows = 1000, empirical = TRUE)
sim_data2$MZ <- mutate(sim_data2$MZ, iX_T2 = iX_T1)
dwpmz <- sim_data2$MZ
dwpdz <- sim_data2$DZ
dim(dwpdz) #
## [1] 1000 6
## X_T1 Y_T1 iX_T1 X_T2 Y_T2 iX_T2
## 1 -0.035 -0.27 1.03 -1.1 1.461 1.16
## 2 0.481 1.56 1.29 1.5 0.156 2.26
## 3 -1.018 -1.55 -0.65 -1.4 0.011 -0.76
## 4 1.193 -0.93 -0.64 2.0 -0.079 0.68
## 5 1.600 2.11 0.62 -1.3 -0.314 1.37
## 6 0.050 -0.89 -0.50 1.4 0.766 0.80
## X_T1 Y_T1 iX_T1 X_T2 Y_T2 iX_T2
## Min. :-3.9 Min. :-4.2 Min. :-3.06 Min. :-4.1 Min. :-4.2 Min. :-3.01
## 1st Qu.:-0.9 1st Qu.:-0.9 1st Qu.:-0.68 1st Qu.:-0.9 1st Qu.:-0.9 1st Qu.:-0.66
## Median : 0.0 Median : 0.0 Median : 0.00 Median : 0.0 Median : 0.0 Median :-0.05
## Mean : 0.0 Mean : 0.0 Mean : 0.00 Mean : 0.0 Mean : 0.0 Mean : 0.00
## 3rd Qu.: 0.9 3rd Qu.: 1.0 3rd Qu.: 0.69 3rd Qu.: 0.9 3rd Qu.: 0.9 3rd Qu.: 0.68
## Max. : 4.4 Max. : 4.3 Max. : 2.92 Max. : 3.8 Max. : 4.1 Max. : 2.96
## [1] 1000 6
## X_T1 Y_T1 iX_T1 X_T2 Y_T2 iX_T2
## 1 -0.89 -1.5548 -0.490 -0.1428 -3.586 -0.490
## 2 -0.14 1.3320 -0.553 0.7621 0.332 -0.553
## 3 -1.22 -0.0653 -0.034 -0.0076 1.245 -0.034
## 4 0.72 -0.0053 0.166 1.6579 0.067 0.166
## 5 -0.92 -0.8504 0.171 -1.3165 -1.570 0.171
## 6 -1.78 0.9096 -1.363 -2.7041 -1.469 -1.363
## X_T1 Y_T1 iX_T1 X_T2 Y_T2 iX_T2
## Min. :-3.7 Min. :-4.6 Min. :-3.0 Min. :-4.0 Min. :-4.1 Min. :-3.0
## 1st Qu.:-0.9 1st Qu.:-1.0 1st Qu.:-0.6 1st Qu.:-0.9 1st Qu.:-1.0 1st Qu.:-0.6
## Median :-0.1 Median : 0.0 Median : 0.0 Median : 0.0 Median : 0.0 Median : 0.0
## Mean : 0.0 Mean : 0.0 Mean : 0.0 Mean : 0.0 Mean : 0.0 Mean : 0.0
## 3rd Qu.: 0.9 3rd Qu.: 0.9 3rd Qu.: 0.6 3rd Qu.: 0.9 3rd Qu.: 0.9 3rd Qu.: 0.6
## Max. : 4.6 Max. : 4.4 Max. : 3.7 Max. : 4.7 Max. : 4.2 Max. : 3.7
pleio = mrdoc1
pleio$MZ <- mrdoc1$MZ + mxData(dwpmz, type = "raw")
pleio$DZ <- mrdoc1$DZ + mxData(dwpdz, type = "raw")
pleio <- mxRun(pleio)
## Running mrdoc1 with 15 parameters
## Summary of mrdoc1
##
## free parameters:
## name matrix row col Estimate Std.Error A
## 1 g1 top.BE Y X 0.14300289 0.031
## 2 b1 top.BE X iX 0.31599910 0.022
## 3 b2 top.BE Y iX 0.12700095 0.025
## 4 ax2 top.A X X 0.42359100 0.077 !
## 5 covA top.A X Y 0.41058422 0.067 !
## 6 ay2 top.A Y Y 0.66931525 0.090 !
## 7 sigma_x top.A iX iX 0.99900149 0.026
## 8 cx2 top.C X X 0.66932329 0.074 !
## 9 covC top.C X Y 0.22077728 0.054 !
## 10 cy2 top.C Y Y 0.42358333 0.080 !
## 11 ex2 top.E X X 0.51847605 0.023
## 12 ey2 top.E Y Y 0.51848682 0.023 !
## 13 mX1 top.mean_dz 1 X_T1 0.00000085 0.026
## 14 mY2 top.mean_dz 1 Y_T1 -0.00000048 0.027
## 15 miX1 top.mean_dz 1 iX_T1 -0.00000751 0.021 !
##
## Model Statistics:
## | Parameters | Degrees of Freedom | Fit (-2lnL units)
## Model: 15 10985 32379
## Saturated: NA NA NA
## Independence: NA NA NA
## Number of observations/statistics: 2000/11000
##
## Information Criteria:
## | df Penalty | Parameters Penalty | Sample-Size Adjusted
## AIC: 10409 32409 32409
## BIC: -51117 32493 32445
## CFI: NA
## TLI: 1 (also known as NNFI)
## RMSEA: 0 [95% CI (NA, NA)]
## Prob(RMSEA <= 0.05): NA
## To get additional fit indices, see help(mxRefModels)
## timestamp: 2024-03-27 14:48:51
## Wall clock time: 0.14 secs
## optimizer: SLSQP
## OpenMx version number: 2.20
## Need help? See help(mxSummary)
parameters used for simulation g1 = 0.143 b1 = 0.316 b2 = 0.127 Q5: check the results: does MR-DoC model recover correctly the parameters b2, g1, and b1? Hint: look at the true values used for simulation ANSWER: Yes. MR-DoC: Test g1=0 using a likelihood ratio test #########
pleio2 <- omxSetParameters(name = "no_causal", pleio, label="g1", free = FALSE,
values = 0)
pleio2 <- mxRun(pleio2)
## Running no_causal with 14 parameters
## base comparison ep minus2LL df AIC diffLL diffdf p
## 1 mrdoc1 <NA> 15 32379 10985 32409 NA NA NA
## 2 mrdoc1 no_causal 14 32400 10986 32428 21 1 0.0000043
Let’s run in unrelateds: —————————————————
# take twin 1 from mz and dz
dat2 <- rbind(dwpmz[,1:3],dwpdz[,1:3])|>
rename(X = X_T1,
Y = Y_T1,
I = iX_T1)
unrel3 <- IVModel + mxData(dat2, type = "raw")
# in the interest of brevity, let's autostart the model
unrel3 <- mxAutoStart(unrel3)
unrel3 <- mxRun(unrel3)
## Running MR with 9 parameters
## Summary of MR
##
## free parameters:
## name matrix row col Estimate Std.Error A
## 1 g1 A Y X 0.54484392 0.083 !
## 2 b1 A X I 0.31600496 0.028 !
## 3 vI S I I 0.99903761 0.032 !
## 4 vX S eX eX 1.61131910 0.051 !
## 5 re S eX eY -0.01613821 0.137 !
## 6 vY S eY eY 1.36412244 0.043 !
## 7 mX M 1 X -0.00000021 0.028
## 8 mY M 1 Y -0.00000017 0.026
## 9 mI M 1 I 0.00000305 0.022 !
##
## Model Statistics:
## | Parameters | Degrees of Freedom | Fit (-2lnL units)
## Model: 9 5991 18600
## Saturated: 9 5991 NA
## Independence: 6 5994 NA
## Number of observations/statistics: 2000/6000
##
## Information Criteria:
## | df Penalty | Parameters Penalty | Sample-Size Adjusted
## AIC: 6618 18618 18618
## BIC: -26937 18669 18640
## CFI: NA
## TLI: 1 (also known as NNFI)
## RMSEA: 0 [95% CI (NA, NA)]
## Prob(RMSEA <= 0.05): NA
## To get additional fit indices, see help(mxRefModels)
## timestamp: 2024-03-27 14:48:52
## Wall clock time: 0.053 secs
## optimizer: SLSQP
## OpenMx version number: 2.20
## Need help? See help(mxSummary)
parameters used for simulation g1 = 0.143 b1 = 0.316 b2 = 0.127 Q6: Does the 2sls model recover correctly the parameters used for simulation? ANSWER: No, it overestimated g1
unrel4 <- omxSetParameters(name = "no_causal", unrel3, label="g1", free = FALSE,
values = 0)
unrel4 <- mxRun(unrel4)
## Running no_causal with 8 parameters
## base comparison ep minus2LL df AIC diffLL diffdf p
## 1 MR <NA> 9 18600 5991 18618 NA NA NA
## 2 MR no_causal 8 18633 5992 18649 32 1 0.000000014
## Which model has highest power? (look at chisq difference) -----------------
## MR-DoC in Twins
chisq_Twins=mxCompare(pleio,pleio2)[2,7]
chisq_Twins
## [1] 21
## [1] 32
Q7: Conclusion? Larger diff, higher power ANSWER: Power in the model with unrelated was higher.
## Fit the model using two stage least squares ####################
TSLS1=lm(X~I,data=dat2)
Xhat=predict(TSLS1)
TSLS2=lm(Y~Xhat,data=dat2)
summary(TSLS2)
##
## Call:
## lm(formula = Y ~ Xhat, data = dat2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.622 -0.922 -0.037 0.920 4.710
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0000000000000000135 0.0302219807176829502 0.00 1
## Xhat 0.5448987341772134618 0.0956870349706873818 5.69 0.000000014 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.4 on 1998 degrees of freedom
## Multiple R-squared: 0.016, Adjusted R-squared: 0.0155
## F-statistic: 32.4 on 1 and 1998 DF, p-value: 0.0000000142
Check above that the 2sls test using either glm or SEM still finds same estimates. Q8: Do we detect a causal effect if we account for pleiotropy (SEM, 2stage least squares)? ANSWER: The models using unrelated detects a causal effect but not accounting for pleiotropy the estimate was biased. Scenario 3. Bidirectional causation ——————————————- Specifying the mrdoc2 model. This is similar to mrdoc1 except larger matrices (one more instrument) Matrix for causal paths
BE <- mxMatrix(name = "BE", type = "Full",nrow=4, ncol = 4,byrow = TRUE,
labels = c(NA, "g2", "b1", "b4",
"g1", NA, "b2", "b3",
NA, NA, NA, NA,
NA, NA, NA, NA),
free = c(FALSE, TRUE, TRUE, FALSE,
TRUE, FALSE, FALSE, TRUE,
FALSE, FALSE, FALSE, FALSE,
FALSE, FALSE, FALSE, FALSE),
dimnames = list(c("X", "Y", "iX", "iY"),
c("X", "Y", "iX", "iY")))
A, C and E decomposition
A <- mxMatrix(name = 'A', type='Symm', nrow=4, ncol = 4,byrow = TRUE,
labels=c("ax2","covA",NA,NA,
"covA","ay2",NA,NA,
NA,NA,"x2" ,"rf",
NA,NA,"rf","y2"),
free=c(TRUE,TRUE,FALSE,FALSE,
TRUE,TRUE,FALSE,FALSE,
FALSE,FALSE,TRUE,TRUE,
FALSE,FALSE,TRUE,TRUE),
dimnames = list(c("X", "Y", "iX", "iY"),
c("X", "Y", "iX", "iY")))
C <- mxMatrix(name = 'C', type='Symm',nrow=4, ncol = 4,byrow = TRUE,
labels =c("cx2", "covC",NA,NA,
"covC","cy2" ,NA,NA,
NA, NA, NA,NA,
NA, NA, NA,NA),
free=c(TRUE,TRUE, FALSE,FALSE,
TRUE,TRUE, FALSE,FALSE,
FALSE,FALSE,FALSE,FALSE,
FALSE,FALSE,FALSE,FALSE),
dimnames = list(c("X", "Y", "iX", "iY"),
c("X", "Y", "iX", "iY")))
E <- mxMatrix(name = 'E', type='Symm', nrow=4, ncol = 4,byrow = TRUE,
labels =c("ex2", "covE",NA,NA,
"covE","ey2" ,NA,NA,
NA, NA, NA,NA,
NA, NA, NA,NA),
free= c(TRUE,TRUE, FALSE,FALSE,
TRUE,TRUE, FALSE,FALSE,
FALSE,FALSE,FALSE,FALSE,
FALSE,FALSE,FALSE,FALSE),
dimnames = list(c("X", "Y", "iX", "iY"),
c("X", "Y", "iX", "iY")))
A filter matrix, as we need to remove the PRSs from twin 2
filter <- mxMatrix(name = 'filter', type='Full', nrow=6, ncol=8, free=FALSE,
byrow = TRUE,
values=c(1,0,0,0,0,0,0,0,
0,1,0,0,0,0,0,0,
0,0,1,0,0,0,0,0,
0,0,0,1,0,0,0,0,
0,0,0,0,1,0,0,0,
0,0,0,0,0,1,0,0),
dimnames = list(c("X_T1", "Y_T1", "iX_T1","iY_T1",
"X_T2", "Y_T2"),
c("X_T1", "Y_T1", "iX_T1","iY_T1","X_T2",
"Y_T2", "iX_T2","iY_T2")))
LY <- mxMatrix(name = 'LY', type='Full',nrow=4, ncol = 4, free = FALSE,
values = diag(4), labels = NA,
dimnames = list(c("X", "Y", "iX", "iY"),
c("X", "Y", "iX", "iY")))
The object with the means
mean_dz <- mxMatrix(name = 'mean_dz', type='Full', nrow=1, ncol=8, free=TRUE,
byrow = TRUE, values = 0, labels=c('mX1','mY2','miX1','miY2',
'mX1','mY2','miX1','miY2') )
Removing the PRS from the MZ means
algebras <- list( mxAlgebra('mean_mz', expression = mean_dz %*% t(filter)),
# Identity matrix to algebra calculations
mxMatrix(name = 'I', type='Iden', nrow= 4,ncol= 4 ),
# The needed matrices for calculating the variances
mxAlgebra('A_' , expression = LY %&% solve(I - BE) %&% A),
mxAlgebra('C_' , expression = LY %&%solve(I - BE) %&% C),
mxAlgebra('E_' , expression = LY %&%solve(I - BE) %&% E),
mxAlgebra('full_variance' , expression= A_ + C_ + E_),
mxAlgebra('variance_mz_', expression=rbind(
cbind(full_variance, A_ + C_),
cbind(A_ + C_, full_variance))),
mxAlgebra('variance_dz', expression=rbind(
cbind(full_variance, 0.5%x%A_ + C_),
cbind(0.5%x%A_ + C_, full_variance))),
mxAlgebra('variance_mz', expression= filter%&%variance_mz_))
top_mr2 <- mxModel("top", BE, A, C, E, filter, LY, mean_dz, algebras)
Preparing the objects for the multiple groups (MZ, DZ) analysis
MZ_mr2 = mxModel("MZ", mxFitFunctionML(),
mxExpectationNormal(covariance = "top.variance_mz",means = "top.mean_mz",
dimnames = c("X_T1", "Y_T1", "iX_T1", "iY_T1",
"X_T2", "Y_T2")))
DZ_mr2 = mxModel("DZ", mxFitFunctionML(),
mxExpectationNormal(covariance = "top.variance_dz", means = "top.mean_dz",
dimnames = c("X_T1", "Y_T1", "iX_T1", "iY_T1",
"X_T2", "Y_T2", "iX_T2", "iY_T2")))
Combining objects to generate the final model
Simulating using mrdoc2, now we are simulating data so that there is an effect of X on Y and vice-versa. g1, g2 !=0
sim_data3 <- mrdoc2 |>
omxSetParameters(labels = c("g1","g2", "b1", "b3",
"ax2", "ay2", "cx2", "cy2", "ex2", "ey2",
"covA", "covC", "covE","rf", "x2", "y2"),
values = c(0.184, 0.111, 0.411, 0.519,
0.424, 0.670, 0.670, 0.424, 0.519, 0.519,
0.411,0.221,0.221,0.111, 1,1)) |>
mxGenerateData( nrows = 1000, empirical = TRUE)
Remember, we need to duplicate the columns for the second twin
sim_data3$MZ <- mutate(sim_data3$MZ, iX_T2 = iX_T1, iY_T2 = iY_T1)
dg2mz <- sim_data3$MZ
dg2dz <- sim_data3$DZ
dim(dg2dz) #
## [1] 1000 8
## X_T1 Y_T1 iX_T1 iY_T1 X_T2 Y_T2 iX_T2 iY_T2
## 1 1.09 0.83 0.49 -0.84 2.06 1.510 0.51 -1.2
## 2 0.75 -0.86 1.13 -1.56 -0.40 -1.701 -0.11 -1.6
## 3 1.24 1.41 -0.85 -0.65 -2.17 1.506 -1.24 1.1
## 4 2.16 2.24 -1.12 0.26 1.53 1.849 -0.55 1.2
## 5 0.29 0.40 -0.45 0.68 -0.65 0.032 -1.81 1.0
## 6 2.65 2.96 -0.69 -0.67 2.66 1.472 0.78 -1.6
## X_T1 Y_T1 iX_T1 iY_T1 X_T2 Y_T2 iX_T2
## Min. :-5.6 Min. :-5.0 Min. :-3.5 Min. :-2.8 Min. :-5.0 Min. :-4.5 Min. :-2.9
## 1st Qu.:-1.0 1st Qu.:-1.0 1st Qu.:-0.6 1st Qu.:-0.7 1st Qu.:-1.0 1st Qu.:-1.1 1st Qu.:-0.7
## Median : 0.0 Median :-0.1 Median : 0.0 Median : 0.0 Median : 0.0 Median : 0.0 Median : 0.0
## Mean : 0.0 Mean : 0.0 Mean : 0.0 Mean : 0.0 Mean : 0.0 Mean : 0.0 Mean : 0.0
## 3rd Qu.: 1.0 3rd Qu.: 1.1 3rd Qu.: 0.7 3rd Qu.: 0.6 3rd Qu.: 0.9 3rd Qu.: 1.1 3rd Qu.: 0.7
## Max. : 4.3 Max. : 4.5 Max. : 3.0 Max. : 3.4 Max. : 4.7 Max. : 4.5 Max. : 3.2
## iY_T2
## Min. :-3.12
## 1st Qu.:-0.64
## Median : 0.00
## Mean : 0.00
## 3rd Qu.: 0.67
## Max. : 3.08
## [1] 1000 8
## X_T1 Y_T1 iX_T1 iY_T1 X_T2 Y_T2 iX_T2 iY_T2
## 1 1.59 1.517 -0.498 0.330 2.19 1.234 -0.498 0.330
## 2 0.40 -0.608 -0.982 -0.358 0.71 -0.108 -0.982 -0.358
## 3 -0.39 -1.330 0.519 -0.828 -2.66 -3.425 0.519 -0.828
## 4 1.14 0.557 0.686 0.709 -0.17 0.517 0.686 0.709
## 5 0.60 0.081 2.273 -0.552 0.37 0.406 2.273 -0.552
## 6 -0.28 -1.163 0.085 0.033 0.17 -0.097 0.085 0.033
## X_T1 Y_T1 iX_T1 iY_T1 X_T2 Y_T2 iX_T2
## Min. :-4.9 Min. :-5.3 Min. :-2.72 Min. :-4.0 Min. :-4.6 Min. :-5.2 Min. :-2.72
## 1st Qu.:-0.9 1st Qu.:-1.1 1st Qu.:-0.72 1st Qu.:-0.6 1st Qu.:-0.9 1st Qu.:-1.1 1st Qu.:-0.72
## Median : 0.0 Median : 0.0 Median :-0.04 Median : 0.0 Median : 0.0 Median : 0.0 Median :-0.04
## Mean : 0.0 Mean : 0.0 Mean : 0.00 Mean : 0.0 Mean : 0.0 Mean : 0.0 Mean : 0.00
## 3rd Qu.: 1.0 3rd Qu.: 1.0 3rd Qu.: 0.70 3rd Qu.: 0.7 3rd Qu.: 1.0 3rd Qu.: 1.0 3rd Qu.: 0.70
## Max. : 5.3 Max. : 5.0 Max. : 2.97 Max. : 2.9 Max. : 5.6 Max. : 4.3 Max. : 2.97
## iY_T2
## Min. :-4.0
## 1st Qu.:-0.6
## Median : 0.0
## Mean : 0.0
## 3rd Qu.: 0.7
## Max. : 2.9
bidir <- mrdoc2
bidir$MZ <- bidir$MZ + mxData(dg2mz, type = "raw")
bidir$DZ <- bidir$DZ + mxData(dg2dz, type = "raw")
bidir <- mxAutoStart(bidir)
bidir <- mxRun(bidir)
## Running mrdoc2 with 20 parameters
You can check for local identification: (this is slow, do this at home) mxCheckIdentification(bidir)$status
## Summary of mrdoc2
##
## free parameters:
## name matrix row col Estimate Std.Error A
## 1 g1 top.BE Y X 0.18404298 0.054 !
## 2 g2 top.BE X Y 0.11099381 0.042 !
## 3 b1 top.BE X iX 0.41099670 0.023 !
## 4 b3 top.BE Y iY 0.51899828 0.023 !
## 5 ax2 top.A X X 0.42357732 0.088 !
## 6 covA top.A X Y 0.41058220 0.076 !
## 7 ay2 top.A Y Y 0.66934882 0.101 !
## 8 x2 top.A iX iX 0.99894780 0.026 !
## 9 rf top.A iX iY 0.11087160 0.018 !
## 10 y2 top.A iY iY 0.99894780 0.026 !
## 11 cx2 top.C X X 0.66931127 0.080 !
## 12 covC top.C X Y 0.22074242 0.074 !
## 13 cy2 top.C Y Y 0.42355736 0.085 !
## 14 ex2 top.E X X 0.51847349 0.036 !
## 15 covE top.E X Y 0.22076944 0.041
## 16 ey2 top.E Y Y 0.51848643 0.038 !
## 17 mX1 top.mean_dz 1 X_T1 -0.00000019 0.029
## 18 mY2 top.mean_dz 1 Y_T1 0.00000018 0.031
## 19 miX1 top.mean_dz 1 iX_T1 0.00000012 0.021
## 20 miY2 top.mean_dz 1 iY_T1 -0.00000015 0.021
##
## Model Statistics:
## | Parameters | Degrees of Freedom | Fit (-2lnL units)
## Model: 20 13980 40029
## Saturated: NA NA NA
## Independence: NA NA NA
## Number of observations/statistics: 2000/14000
##
## Information Criteria:
## | df Penalty | Parameters Penalty | Sample-Size Adjusted
## AIC: 12069 40069 40070
## BIC: -66231 40181 40118
## CFI: NA
## TLI: 1 (also known as NNFI)
## RMSEA: 0 [95% CI (NA, NA)]
## Prob(RMSEA <= 0.05): NA
## To get additional fit indices, see help(mxRefModels)
## timestamp: 2024-03-27 14:48:53
## Wall clock time: 0.15 secs
## optimizer: SLSQP
## OpenMx version number: 2.20
## Need help? See help(mxSummary)
simulated parameters g1 = 0.184 g2 = 0.111 b1 = 0.411 b3 = 0.519 Q9: Check the results: does MR-DoC model recover correctly the parameters g1, g2, b1, and b3 ANSWER: Yes, the estimates matches the true values set for the data. MR-DoC: Test g1=0 using a likelihood ratio test
bidir2 <- omxSetParameters(name = "drop_g1g2", bidir, label=c("g1", "g2"),
free = FALSE, values = 0)
bidir2 <- mxRun(bidir2)
## Running drop_g1g2 with 18 parameters
## base comparison ep minus2LL df AIC diffLL diffdf p
## 1 mrdoc2 <NA> 20 40029 13980 40069 NA NA NA
## 2 mrdoc2 drop_g1g2 18 40045 13982 40081 16 2 0.00032
Q10: How many degrees of freedom for this LRT? ANSWER: Two degrees of freedom, we dropped both g1 and g2. Q11: Are the causal paths significant? ANSWER: Dropping g1 and g2 resulted in a significantly worse model, therefore the bidirectional relationship is significant.
We are proceeding to check if running 2sls in each direction matches the true values we set in the simulation. Remember the model includes indirect pleiotropy of type PS2 -> rf -> PS1 -> X
# take twin 1 from mz and dz
dat4=rbind(dg2mz[,1:4],dg2dz[,1:4]) |>
rename(X = X_T1,
Y = Y_T1,
I = iX_T1)
unrel7 <- IVModel + mxData(dat4, type = "raw")
# in the interest of brevity, let's autostart the model
unrel7 <- mxAutoStart(unrel7)
unrel7 <- mxRun(unrel7)
## Running MR with 9 parameters
## Summary of MR
##
## free parameters:
## name matrix row col Estimate Std.Error A
## 1 g1 A Y X 0.31920390 0.067
## 2 b1 A X I 0.42609788 0.031
## 3 vI S I I 0.99899914 0.032
## 4 vX S eX eX 1.90053222 0.060
## 5 re S eX eY 0.82566402 0.134
## 6 vY S eY eY 1.61915635 0.122
## 7 mX M 1 X -0.00000031 0.031
## 8 mY M 1 Y -0.00000026 0.028
## 9 mI M 1 I -0.00000001 0.022
##
## Model Statistics:
## | Parameters | Degrees of Freedom | Fit (-2lnL units)
## Model: 9 5991 18772
## Saturated: 9 5991 NA
## Independence: 6 5994 NA
## Number of observations/statistics: 2000/6000
##
## Information Criteria:
## | df Penalty | Parameters Penalty | Sample-Size Adjusted
## AIC: 6790 18790 18791
## BIC: -26765 18841 18812
## CFI: NA
## TLI: 1 (also known as NNFI)
## RMSEA: 0 [95% CI (NA, NA)]
## Prob(RMSEA <= 0.05): NA
## To get additional fit indices, see help(mxRefModels)
## timestamp: 2024-03-27 14:48:54
## Wall clock time: 0.039 secs
## optimizer: SLSQP
## OpenMx version number: 2.20
## Need help? See help(mxSummary)
## Let's run in unrelateds Y -> X: ---------------------------------------------
# take twin 1 from mz and dz
dat5=rbind(dg2mz[,1:4],dg2dz[,1:4]) |>
rename(Y = X_T1, # notice the inversion here
X = Y_T1, # notice the inversion here
I = iY_T1)
The inversion above is only necessary so we don’t have to rewrite the model from scratch. Bear with me as I rename the parameters in the base model This will help interpreting results
IVModelYX = omxSetParameters(name = "IVModelYX", IVModel, label = c("g1","b1", "vX", "vY", "mX", "mY"),
newlabel = c("g2","b3", "vY", "vX", "mY", "mX"))
unrel8 <- IVModelYX + mxData(dat5, type = "raw")
# in the interest of brevity, let's autostart the model
unrel8 <- mxAutoStart(unrel8)
unrel8 <- mxRun(unrel8)
## Running IVModelYX with 9 parameters
## Summary of IVModelYX
##
## free parameters:
## name matrix row col Estimate Std.Error A
## 1 g2 A Y X 0.1957359896 0.052
## 2 b3 A X I 0.5383903493 0.032
## 3 vI S I I 0.9990000291 0.032
## 4 vY S eX eX 2.0688193334 0.065
## 5 re S eX eY 1.0285950507 0.118
## 6 vX S eY eY 1.5888873870 0.119
## 7 mY M 1 X 0.0000005475 0.032
## 8 mX M 1 Y 0.0000004254 0.028
## 9 mI M 1 I 0.0000000055 0.022
##
## Model Statistics:
## | Parameters | Degrees of Freedom | Fit (-2lnL units)
## Model: 9 5991 18628
## Saturated: 9 5991 NA
## Independence: 6 5994 NA
## Number of observations/statistics: 2000/6000
##
## Information Criteria:
## | df Penalty | Parameters Penalty | Sample-Size Adjusted
## AIC: 6646 18646 18647
## BIC: -26909 18697 18668
## CFI: NA
## TLI: 1 (also known as NNFI)
## RMSEA: 0 [95% CI (NA, NA)]
## Prob(RMSEA <= 0.05): NA
## To get additional fit indices, see help(mxRefModels)
## timestamp: 2024-03-27 14:48:54
## Wall clock time: 0.04 secs
## optimizer: SLSQP
## OpenMx version number: 2.20
## Need help? See help(mxSummary)
The result of the summary immediately above is of the relationship of Y to X, in other words g2. simulated parameters g1 = 0.184 g2 = 0.111 b1 = 0.411 b3 = 0.519 Q12: Does MR-SEM models (unrel7 or X->Y, and model unrel8 Y <- X) recover correctly the parameters used for simulation? What happened to the estimates? ANSWER: Both g1 and g2 were overestimated in the 2sls solution. Scenario 4: Pleiotropy & no causal effect STRETCH GOAL! ——————- We are back to MR-DoC1 and we will be simulating data without the causal effect Compare the results obtained with MR-DoC, unrelateds SEM, 2SLS, 2-sample MR.
sim_data4 <- mrdoc1 |>
omxSetParameters(labels = c("g1","b1", "b2",
"ax2", "ay2", "cx2", "cy2", "ex2", "ey2",
"covA", "covC", "covE","sigma_x"),
values = c(0, 0.316, 0.316,
0.424, 0.670, 0.670, 0.424, 0.519, 0.519,
0.411,0.221,0, 1)) |>
mxGenerateData( nrows = 1000, empirical = TRUE)
sim_data4$MZ <- mutate(sim_data4$MZ, iX_T2 = iX_T1)
dg1mz <- sim_data4$MZ
dg1dz <- sim_data4$DZ
dim(dg1dz) #
## [1] 1000 6
## X_T1 Y_T1 iX_T1 X_T2 Y_T2 iX_T2
## 1 1.4663 3.46 -0.58 0.94 1.08 -0.34
## 2 1.1069 0.95 -0.82 0.27 -0.72 -0.30
## 3 0.0035 -0.99 -1.19 0.13 -0.92 -0.77
## 4 1.2180 2.96 2.56 0.69 -0.87 -0.13
## 5 0.2032 1.14 -0.32 1.35 0.17 -0.18
## 6 -0.5218 -0.32 0.42 -1.66 -0.60 -1.00
## X_T1 Y_T1 iX_T1 X_T2 Y_T2 iX_T2
## Min. :-4.4 Min. :-3.9 Min. :-3.05 Min. :-4.3 Min. :-4.1 Min. :-3.2
## 1st Qu.:-0.9 1st Qu.:-0.9 1st Qu.:-0.69 1st Qu.:-0.9 1st Qu.:-0.9 1st Qu.:-0.7
## Median : 0.0 Median : 0.0 Median : 0.03 Median : 0.0 Median : 0.0 Median : 0.0
## Mean : 0.0 Mean : 0.0 Mean : 0.00 Mean : 0.0 Mean : 0.0 Mean : 0.0
## 3rd Qu.: 0.9 3rd Qu.: 0.9 3rd Qu.: 0.68 3rd Qu.: 0.9 3rd Qu.: 0.9 3rd Qu.: 0.7
## Max. : 5.0 Max. : 3.5 Max. : 2.79 Max. : 4.0 Max. : 4.1 Max. : 3.2
## [1] 1000 6
## X_T1 Y_T1 iX_T1 X_T2 Y_T2 iX_T2
## 1 1.22 0.95 0.12 1.50 1.355 0.12
## 2 1.12 1.18 -0.78 0.70 0.932 -0.78
## 3 0.68 0.56 0.32 1.88 0.046 0.32
## 4 0.23 -1.34 0.50 0.45 -1.950 0.50
## 5 -0.37 -0.20 0.33 1.70 -2.129 0.33
## 6 -0.23 -1.57 1.10 -0.43 -0.576 1.10
## X_T1 Y_T1 iX_T1 X_T2 Y_T2 iX_T2
## Min. :-4.5 Min. :-4.7 Min. :-3.1 Min. :-3.9 Min. :-4.4 Min. :-3.1
## 1st Qu.:-0.9 1st Qu.:-0.9 1st Qu.:-0.7 1st Qu.:-0.9 1st Qu.:-0.8 1st Qu.:-0.7
## Median : 0.1 Median : 0.0 Median : 0.0 Median : 0.0 Median : 0.0 Median : 0.0
## Mean : 0.0 Mean : 0.0 Mean : 0.0 Mean : 0.0 Mean : 0.0 Mean : 0.0
## 3rd Qu.: 0.9 3rd Qu.: 0.9 3rd Qu.: 0.7 3rd Qu.: 0.9 3rd Qu.: 0.9 3rd Qu.: 0.7
## Max. : 4.1 Max. : 4.0 Max. : 3.5 Max. : 4.6 Max. : 4.2 Max. : 3.5
no_causal <- mrdoc1
no_causal$MZ <- mrdoc1$MZ + mxData(dg1mz, type = "raw")
no_causal$DZ <- mrdoc1$DZ + mxData(dg1dz, type = "raw")
no_causal <- mxRun(no_causal)
## Running mrdoc1 with 15 parameters
## Summary of mrdoc1
##
## free parameters:
## name matrix row col Estimate Std.Error A
## 1 g1 top.BE Y X -0.00000169 0.031
## 2 b1 top.BE X iX 0.31600043 0.022
## 3 b2 top.BE Y iX 0.31600035 0.025
## 4 ax2 top.A X X 0.42356291 0.077
## 5 covA top.A X Y 0.41059142 0.067
## 6 ay2 top.A Y Y 0.66932379 0.090 !
## 7 sigma_x top.A iX iX 0.99899888 0.026
## 8 cx2 top.C X X 0.66934112 0.074
## 9 covC top.C X Y 0.22078101 0.054
## 10 cy2 top.C Y Y 0.42358392 0.080
## 11 ex2 top.E X X 0.51848479 0.023
## 12 ey2 top.E Y Y 0.51848004 0.023
## 13 mX1 top.mean_dz 1 X_T1 0.00000296 0.026
## 14 mY2 top.mean_dz 1 Y_T1 0.00000083 0.026
## 15 miX1 top.mean_dz 1 iX_T1 0.00000136 0.021
##
## Model Statistics:
## | Parameters | Degrees of Freedom | Fit (-2lnL units)
## Model: 15 10985 32379
## Saturated: NA NA NA
## Independence: NA NA NA
## Number of observations/statistics: 2000/11000
##
## Information Criteria:
## | df Penalty | Parameters Penalty | Sample-Size Adjusted
## AIC: 10409 32409 32409
## BIC: -51117 32493 32445
## CFI: NA
## TLI: 1 (also known as NNFI)
## RMSEA: 0 [95% CI (NA, NA)]
## Prob(RMSEA <= 0.05): NA
## To get additional fit indices, see help(mxRefModels)
## timestamp: 2024-03-27 14:48:55
## Wall clock time: 0.15 secs
## optimizer: SLSQP
## OpenMx version number: 2.20
## Need help? See help(mxSummary)
Q13: Check the results: does MR-DoC model recover correctly the parameters b2, g1, and b1? ANSWER: Yes, it does.
## MR-DoC: Test g1=0 using a likelihood ratio test #########
no_causal2 <- omxSetParameters(name = "drop_g1", no_causal, label="g1",
free = FALSE, values = 0)
no_causal2 <- mxRun(no_causal2)
## Running drop_g1 with 14 parameters
## base comparison ep minus2LL df AIC diffLL diffdf p
## 1 mrdoc1 <NA> 15 32379 10985 32409 NA NA NA
## 2 mrdoc1 drop_g1 14 32379 10986 32407 -0.000000047 1 1
Q14: Do we detect a causal effect with a true g1 value set to zero? ANSWER: As expected, no.
## Let's run in unrelateds: ---------------------------------------------------
# take twin 1 from mz and dz
dat3 <- rbind(dg1mz[,1:3],dg1dz[,1:3])|>
rename(X = X_T1,
Y = Y_T1,
I = iX_T1)
unrel5 <- IVModel + mxData(dat3, type = "raw")
# in the interest of brevity, let's autostart the model
unrel5 <- mxAutoStart(unrel5)
unrel5 <- mxRun(unrel5)
## Running MR with 9 parameters
parameters used for simulation g1 = 0 b1 = 0.316 b2 = 0.316
## Summary of MR
##
## free parameters:
## name matrix row col Estimate Std.Error A
## 1 g1 A Y X 1.0000224830708 0.099
## 2 b1 A X I 0.3160040126595 0.028 !
## 3 vI S I I 0.9990013057179 0.032
## 4 vX S eX eX 1.6114040656051 0.051 !
## 5 re S eX eY -0.9800670433984 0.166
## 6 vY S eY eY 1.9600849023070 0.204
## 7 mX M 1 X -0.0000046579304 0.028 !
## 8 mY M 1 Y -0.0000042512894 0.031 !
## 9 mI M 1 I -0.0000000000029 0.022
##
## Model Statistics:
## | Parameters | Degrees of Freedom | Fit (-2lnL units)
## Model: 9 5991 18600
## Saturated: 9 5991 NA
## Independence: 6 5994 NA
## Number of observations/statistics: 2000/6000
##
## Information Criteria:
## | df Penalty | Parameters Penalty | Sample-Size Adjusted
## AIC: 6618 18618 18618
## BIC: -26937 18669 18640
## CFI: NA
## TLI: 1 (also known as NNFI)
## RMSEA: 0 [95% CI (NA, NA)]
## Prob(RMSEA <= 0.05): NA
## To get additional fit indices, see help(mxRefModels)
## timestamp: 2024-03-27 14:48:55
## Wall clock time: 0.04 secs
## optimizer: SLSQP
## OpenMx version number: 2.20
## Need help? See help(mxSummary)
Q15: Does MR-SEM model recover correctly the parameters used for simulation? ANSWER: No, g1 = 0 + pleiotropy severily biased the causal path in the models that used unrelated data.
unrel6 <- omxSetParameters(name = "no_causal", unrel5, label="g1", free = FALSE,
values = 0)
unrel6 <- mxRun(unrel6)
## Running no_causal with 8 parameters
## base comparison ep minus2LL df AIC diffLL diffdf p
## 1 MR <NA> 9 18600 5991 18618 NA NA NA
## 2 MR no_causal 8 18720 5992 18736 120 1 0.00000000000000000000000000059
## Fit the model using two stage least squares ####################
TSLS1=lm(X~I,data=dat3)
Xhat=predict(TSLS1)
TSLS2=lm(Y~Xhat,data=dat3)
summary(TSLS2)
##
## Call:
## lm(formula = Y ~ Xhat, data = dat3)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.725 -0.880 0.026 0.831 3.737
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.00000000000000000517 0.02839894364232584470 0.0 1
## Xhat 1.00000000000000177636 0.08991504358428256682 11.1 <0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.3 on 1998 degrees of freedom
## Multiple R-squared: 0.0583, Adjusted R-squared: 0.0578
## F-statistic: 124 on 1 and 1998 DF, p-value: <0.0000000000000002
The 2sls regression again recover the same biased estimates as MR-SEM. In the presence of pleiotropy MR-DoC outperforms the other methods.