Mendelian randomization using the twin design

This script is a modified version of Minica & Neale 2018 AGES workshop In this script we will test the (bidirectional) causal effect between two phenotypes, think of BMI on SBP (systolic blood pressure) and vice-versa, and polygenic scores for each. In the interest of brevity we will always refer to BMI as variable X, and SBP as variable Y, with the respective instruments iX and iY.

There are four main take away messages that you should focus: 1. How to setup a simulation in OpenMX using mxGenerateData 2. That a two-stages least squares test can be specified in SEM in OpenMX 3. That MR-DoC recovers the same estimates from 2sls test 4. That in the presence of horizontal pleiotropy estimates are biased in the 2sls test and unbiased in MR-DoC

The script contains four scenarios, - no pleiotropy between the instrument and the outcome (Scenario 1), - where there is pleiotropy (Scenario 2), - presence of a bidirectional causal effect (Scenario 3) - absence of a causal effect (Scenario 4) See presentation slides for path diagrams. If you are using RStudio you can navigate using the outline dropdown menu Setting the stage ————————————————————

rm(list=ls())  # Emptying the R environment, not recommended usually but useful
# for this workshop


# loading required packages
library(OpenMx)
library(MASS)
library(dplyr)

options(digits = 2, scipen = 999)  # we dont want scientific notation
mxOption(NULL, "Default optimizer", "SLSQP")

The models are specified in three objects, top (for common parts), MZ and DZ spend some time recognizing the elements in the model, by now you should have seen similar code. Notice two minor style changes, I am naming the objects at the beginning and the matrices labels spelled out in the positions they will end up in the matrix. The final objects (mrdoc1, mrdoc2) will be reused throughout the script. The matrix containing regressions for causal paths and instruments

BE <- mxMatrix(name = "BE", type = "Full",nrow=3,  ncol = 3, byrow = TRUE,
               labels = c(NA,   "g2", "b1",
                          "g1", NA,   "b2",
                          NA,   NA,   NA),
               free = c(FALSE, FALSE, TRUE,
                        TRUE,  FALSE, TRUE,
                        FALSE, FALSE, FALSE),
               dimnames = list(c("X", "Y", "iX"),
                               c("X", "Y", "iX")))

ACE decomposition

A <-  mxMatrix(name = 'A', type='Symm', nrow=3,ncol = 3,byrow = TRUE,
               labels=c("ax2", "covA", NA,
                        "covA","ay2",  NA,
                        NA,    NA,     "sigma_x"),
               free=c(TRUE, TRUE, FALSE,
                      TRUE, TRUE, FALSE,
                      FALSE,FALSE,TRUE),
               dimnames = list(c("X", "Y", "iX"),
                               c("X", "Y", "iX")))

C <-  mxMatrix(name = 'C', type='Symm',nrow=3, ncol = 3,byrow = TRUE,
               labels =c("cx2", "covC", NA,
                         "covC","cy2",  NA,
                         NA,    NA,     NA),
               free=c(TRUE, TRUE, FALSE,
                      TRUE, TRUE, FALSE,
                      FALSE,FALSE,FALSE),
               dimnames = list(c("X", "Y", "iX"),
                               c("X", "Y", "iX")))

E <-  mxMatrix(name = 'E', type='Symm', nrow=3, ncol = 3,byrow = TRUE,
               labels =c("ex2", "covE",NA,
                         "covE","ey2", NA,
                         NA,     NA,   NA),
               free= c(TRUE, FALSE,FALSE,
                       FALSE,TRUE, FALSE,
                       FALSE,FALSE,FALSE),
               dimnames = list(c("X", "Y", "iX"),
                               c("X", "Y", "iX")))

The filter matrix is used to remove the PRS from _T2 in MZs, if we kept it the matrix would become redundant resulting in the Hessian not positive

filter <-  mxMatrix(name = 'filter', type='Full', nrow=5, ncol=6, free=FALSE,
                    byrow = TRUE,
                    values=c(1,0,0,0,0,0,
                             0,1,0,0,0,0,
                             0,0,1,0,0,0,
                             0,0,0,1,0,0,
                             0,0,0,0,1,0),
                    dimnames = list(c("X_T1", "Y_T1", "iX_T1","X_T2", "Y_T2"),
                                    c("X_T1", "Y_T1", "iX_T1","X_T2", "Y_T2", "iX_T2")))

This lambda (LY) matrix is fixing total variances to 1 for PRSs and phenotypes

LY <-  mxMatrix(name = 'LY', type='Full',nrow=3, ncol = 3, free = FALSE,
                values = diag(3), labels = NA,
                dimnames = list(c("X", "Y", "iX"),
                                c("X", "Y", "iX")))

Means objects

mean_dz <-  mxMatrix(name = 'mean_dz', type='Full', nrow=1, ncol=6,
                     free=TRUE, values= 0, byrow = TRUE,
                     labels=c('mX1','mY2','miX1','mX1','mY2','miX1'))

mean_mz <-  mxAlgebra('mean_mz', expression = mean_dz%*%t(filter))

Identity matrix for the algebras

I <-  mxMatrix(name = 'I', type='Iden', nrow= 3,ncol= 3)

algebras <- list(
  mxAlgebra('A_'  , expression = LY %&% solve(I - BE)%&%A),
  mxAlgebra('C_'  , expression = LY %&% solve(I - BE)%&%C),
  mxAlgebra('E_'  , expression = LY %&% solve(I - BE)%&%E),
  mxAlgebra('SPh' , expression = A_ + C_ + E_),
  mxAlgebra('variance_mz_', expression = rbind(
    cbind(SPh, A_+C_),
    cbind(A_+C_, SPh))),
  mxAlgebra('variance_dz', expression= rbind(
    cbind(SPh, .5%x%A_+C_),
    cbind(.5%x%A_+C_, SPh))),
  mxAlgebra('variance_mz', expression= filter%&%variance_mz_))

top_mr1 <- mxModel("top", BE, A, C, E, filter, LY,  mean_dz, mean_mz, I, algebras)

Preparing the objects for the multiple groups (MZ, DZ) analysis

MZ_mr1 = mxModel("MZ",mxFitFunctionML(),
                 mxExpectationNormal(covariance = "top.variance_mz", means = "top.mean_mz",
                                     dimnames =  c("X_T1", "Y_T1", "iX_T1",
                                                   "X_T2", "Y_T2")))

DZ_mr1 = mxModel("DZ", mxFitFunctionML(),
                 mxExpectationNormal(covariance = "top.variance_dz", means = "top.mean_dz",
                                     dimnames =  c("X_T1", "Y_T1", "iX_T1",
                                                   "X_T2", "Y_T2", "iX_T2")))

Combining objects to generate the final model

mrdoc1 = mxModel("mrdoc1", top_mr1, MZ_mr1, DZ_mr1,
                 mxFitFunctionMultigroup(c("MZ","DZ") ) )

Scenario 1: no pleiotropy ————————————————— Generate simulated data for mrdoc1 Fit the model in unrelateds using a structural equation model Fit the model in twins - using the MR-doc Compare the NCPs of the models in unrelateds and twins Look at the 2 stage least squares results Let’s generate simulated data where b2 (the pleiotropic path) is zero, in other words, no pleiotropy present in the data. As such we need to set the true values, this is a way of doing this:

true_model <-  mrdoc1 |>
  omxSetParameters(labels =  c("g1","b1", "b2",
                               "ax2", "ay2", "cx2", "cy2", "ex2", "ey2",
                               "covA", "covC", "covE","sigma_x"),
                   values = c(0.316, 0.316, 0,
                              0.424, 0.671, 0.671, 0.424, 0.519, 0.519,
                              0.411,0.221,0, 1)) |>
  omxSetParameters(labels = c("b2"), free = FALSE) # remember, no pleiotropy

Simulating data according to the exact covariance matrix from mrdoc1 (above), this is done using the switch empirical = T, this should be quick

sim_data <- mxGenerateData(true_model, nrows = 1000, empirical = TRUE)

The data does not comes with the instrument column for the second twin, we have to duplicate the column. In your analysis, this step will not happen, we are doing this because of how I coded the simulation for this workshop.

sim_data$MZ <- mutate(sim_data$MZ, iX_T2 = iX_T1)

dnpmz <- sim_data$MZ
dnpdz <- sim_data$DZ

dim(dnpdz) #

## [1] 1000    6

head(dnpdz)

##    X_T1  Y_T1 iX_T1   X_T2  Y_T2 iX_T2
## 1  1.17  1.63 -0.43  1.469  1.88 -0.28
## 2 -0.79 -0.79  0.39 -2.307 -2.29 -1.81
## 3  0.18  1.21  0.65  0.286  1.38  0.04
## 4 -1.74 -0.73  0.64 -0.908 -1.79  0.79
## 5  0.96  1.04 -0.36 -1.348  0.06 -1.50
## 6 -0.85 -0.56 -0.47 -0.074 -0.40 -1.21

summary(dnpdz)

##       X_T1           Y_T1          iX_T1           X_T2           Y_T2          iX_T2      
##  Min.   :-5.0   Min.   :-4.4   Min.   :-3.4   Min.   :-4.9   Min.   :-4.8   Min.   :-3.05  
##  1st Qu.:-0.9   1st Qu.:-1.0   1st Qu.:-0.7   1st Qu.:-0.9   1st Qu.:-1.0   1st Qu.:-0.64  
##  Median :-0.1   Median : 0.0   Median : 0.0   Median : 0.0   Median : 0.0   Median : 0.07  
##  Mean   : 0.0   Mean   : 0.0   Mean   : 0.0   Mean   : 0.0   Mean   : 0.0   Mean   : 0.00  
##  3rd Qu.: 0.8   3rd Qu.: 1.0   3rd Qu.: 0.7   3rd Qu.: 0.9   3rd Qu.: 1.0   3rd Qu.: 0.64  
##  Max.   : 4.5   Max.   : 5.4   Max.   : 3.9   Max.   : 4.6   Max.   : 4.2   Max.   : 2.83

dim(dnpmz) #

## [1] 1000    6

head(dnpmz)

##     X_T1  Y_T1 iX_T1  X_T2   Y_T2 iX_T2
## 1 -0.064  1.70 -1.32 -0.39  1.160 -1.32
## 2 -1.285 -1.36 -0.60 -1.43  0.021 -0.60
## 3  0.299  0.44 -1.39  1.17  0.449 -1.39
## 4 -0.733 -2.93 -0.94 -0.61 -2.126 -0.94
## 5  1.016 -0.11  0.30  1.44  0.035  0.30
## 6 -1.071  0.18 -1.67 -1.21 -0.562 -1.67

summary(dnpmz)

##       X_T1           Y_T1          iX_T1           X_T2           Y_T2          iX_T2     
##  Min.   :-4.2   Min.   :-5.5   Min.   :-3.4   Min.   :-5.4   Min.   :-6.7   Min.   :-3.4  
##  1st Qu.:-0.9   1st Qu.:-1.0   1st Qu.:-0.7   1st Qu.:-0.9   1st Qu.:-1.0   1st Qu.:-0.7  
##  Median : 0.0   Median : 0.0   Median : 0.0   Median : 0.0   Median : 0.0   Median : 0.0  
##  Mean   : 0.0   Mean   : 0.0   Mean   : 0.0   Mean   : 0.0   Mean   : 0.0   Mean   : 0.0  
##  3rd Qu.: 0.9   3rd Qu.: 1.0   3rd Qu.: 0.7   3rd Qu.: 0.8   3rd Qu.: 1.0   3rd Qu.: 0.7  
##  Max.   : 3.8   Max.   : 4.3   Max.   : 2.9   Max.   : 4.3   Max.   : 4.7   Max.   : 2.9

We now add the data to the model object created before. Notice that the plus sign is overloaded in OpenMx as it is, for example, in ggplot2. So you can combine objects with the syntax below instead of model = mxModel(model, mxData(data, type = "raw")) This syntax is optional, but helps reducing the lenght of the script

mrdoc1$MZ <- mrdoc1$MZ +  mxData(dnpmz, type = "raw")
mrdoc1$DZ <- mrdoc1$DZ +  mxData(dnpdz, type = "raw")

m1 <- mrdoc1 |>
  # Remember, no b2 in data, no b2 in the model
  omxSetParameters(labels = "b2", free = FALSE) |>
  # The poing of examining the data a few lines above is to set sensible
  # starting values for the model. Here, in the interest of brevity, we ask
  # OpenMx to pick starting values for us.
  mxAutoStart()

m1 <- mxRun(m1)

## Running mrdoc1 with 14 parameters

# In your analyses, make sure to test for local identification frequently:
# mxCheckIdentification(m1)$status
# Model is locally identified
# [1] TRUE

summary(m1)

## Summary of mrdoc1 
##  
## free parameters:
##       name      matrix row   col        Estimate Std.Error A
## 1       g1      top.BE   Y     X  0.316000000123     0.028  
## 2       b1      top.BE   X    iX  0.316000000040     0.021  
## 3      ax2       top.A   X     X  0.423575999118     0.077  
## 4     covA       top.A   X     Y  0.410588999655     0.064  
## 5      ay2       top.A   Y     Y  0.670328999037     0.089  
## 6  sigma_x       top.A  iX    iX  0.999000000056     0.026  
## 7      cx2       top.C   X     X  0.670329000971     0.074  
## 8     covC       top.C   X     Y  0.220779000217     0.054  
## 9      cy2       top.C   Y     Y  0.423576000776     0.080  
## 10     ex2       top.E   X     X  0.518481000185     0.023  
## 11     ey2       top.E   Y     Y  0.518481000190     0.023  
## 12     mX1 top.mean_dz   1  X_T1 -0.000000000056     0.026  
## 13     mY2 top.mean_dz   1  Y_T1 -0.000000000034     0.030  
## 14    miX1 top.mean_dz   1 iX_T1 -0.000000000032     0.021  
## 
## Model Statistics: 
##                |  Parameters  |  Degrees of Freedom  |  Fit (-2lnL units)
##        Model:             14                  10986                 32383
##    Saturated:             NA                     NA                    NA
## Independence:             NA                     NA                    NA
## Number of observations/statistics: 2000/11000
## 
## Information Criteria: 
##       |  df Penalty  |  Parameters Penalty  |  Sample-Size Adjusted
## AIC:          10411                  32411                    32411
## BIC:         -51120                  32490                    32445
## CFI: NA 
## TLI: 1   (also known as NNFI) 
## RMSEA:  0  [95% CI (NA, NA)]
## Prob(RMSEA <= 0.05): NA
## To get additional fit indices, see help(mxRefModels)
## timestamp: 2024-03-27 14:48:49 
## Wall clock time: 0.17 secs 
## optimizer:  SLSQP 
## OpenMx version number: 2.20 
## Need help?  See help(mxSummary)

One way of assessing whether the causal path is significant is by dropping it

m2 <- omxSetParameters(m1, name = "nog1", labels="g1", free = FALSE,
                       values = 0)
m2 <- mxRun(m2)

## Running nog1 with 13 parameters

# Comparing the two models
mxCompare(m1, m2)

##     base comparison ep minus2LL    df   AIC diffLL diffdf                              p
## 1 mrdoc1       <NA> 14    32383 10986 32411     NA     NA                             NA
## 2 mrdoc1       nog1 13    32499 10987 32525    115      1 0.0000000000000000000000000066

Q1: We dropped the causal path, what is the interpretation for the above result? ANSWER: The causal path is significant, model m2 was worse. Can we specify a model for a 2-stage least squares test using SEM and OpenMX?

IVModel = mxModel("MR", type = "RAM", manifestVars = c("X", "Y", "I"),
                  latentVars = c("eX", "eY"),
                  # Path from instrument to exposure
                  mxPath(from = "I" , to = "X", arrows = 1, label = "b1"),
                  # Path from exposure to outcome, g1
                  mxPath(from = "X", to = "Y", label = "g1"),
                  # Latent error+ setting up variance and means for variables
                  mxPath(from = c("I"), arrows = 2, label = "vI"),
                  mxPath(from = c("eX", "eY"), to = c("X","Y"), value = 1, free = FALSE),
                  # Variance of residual errors
                  mxPath(from = c("eX", "eY"), arrows =  2, free = TRUE,
                         labels = c("vX", "vY")),
                  mxPath(connect = "unique.bivariate", from =  c("eX", "eY"),   arrows = 2,
                         values = 0.2, labels = "re"), # Correlation among residuals
                  mxPath("one", to = c("X","Y", "I"), labels = c("mX", "mY", "mI")))

# The above specification closely follows the slides in the presentation
# If you have umx installed you can look at it:
# library(umx)
# plot(IVModel)

Let’s generate a new data set by taking only twin 1 from mzs and dzs

dat1 <- rbind(dnpmz[,1:3],dnpdz[,1:3]) |>
  # we need to rename the variables to match the dimension names set in the
  # IVmodel above
  rename(X = X_T1,
         Y = Y_T1,
         I = iX_T1)

Adding the data to the model

unrel <- IVModel + mxData(dat1, type = "raw")

Know your data, check variable skewness, kurtosis and means, but in the interest of brevity, let’s autostart

unrel <- mxAutoStart(unrel)
unrel <- mxRun(unrel)

## Running MR with 9 parameters

summary(unrel)

## Summary of MR 
##  
## free parameters:
##   name matrix row col   Estimate Std.Error A
## 1   g1      A   Y   X  0.3159990     0.090  
## 2   b1      A   X   I  0.3159993     0.028  
## 3   vI      S   I   I  0.9990026     0.032  
## 4   vX      S  eX  eX  1.6123920     0.051 !
## 5   re      S  eX  eY  0.6313694     0.150  
## 6   vY      S  eY  eY  1.6123882     0.124  
## 7   mX      M   1   X  0.0000090     0.028 !
## 8   mY      M   1   Y -0.0000113     0.028 !
## 9   mI      M   1   I  0.0000033     0.022 !
## 
## Model Statistics: 
##                |  Parameters  |  Degrees of Freedom  |  Fit (-2lnL units)
##        Model:              9                   5991                 18603
##    Saturated:              9                   5991                    NA
## Independence:              6                   5994                    NA
## Number of observations/statistics: 2000/6000
## 
## Information Criteria: 
##       |  df Penalty  |  Parameters Penalty  |  Sample-Size Adjusted
## AIC:           6621                  18621                    18621
## BIC:         -26934                  18672                    18643
## CFI: NA 
## TLI: 1   (also known as NNFI) 
## RMSEA:  0  [95% CI (NA, NA)]
## Prob(RMSEA <= 0.05): NA
## To get additional fit indices, see help(mxRefModels)
## timestamp: 2024-03-27 14:48:50 
## Wall clock time: 0.038 secs 
## optimizer:  SLSQP 
## OpenMx version number: 2.20 
## Need help?  See help(mxSummary)

unrel2 <- omxSetParameters(name = "no_causal", unrel, label="g1", free = FALSE,
                           values = 0)
unrel2 <- mxRun(unrel2)

## Running no_causal with 8 parameters

mxCompare(unrel,unrel2)

##   base comparison ep minus2LL   df   AIC diffLL diffdf      p
## 1   MR       <NA>  9    18603 5991 18621     NA     NA     NA
## 2   MR  no_causal  8    18612 5992 18628    9.1      1 0.0025

Q2: The line above is a Likelihood ratio test with 1 degree of freedom, what is the meaning of a significant p-value in this case? ANSWER: The causal path is significant, dropping it made the model (no_causal) significantly worse

# Now let's compare with a typical 2sls test
TSLS1=lm(X~I,data=dat1)
Xhat=predict(TSLS1)
TSLS2=lm(Y~Xhat,data=dat1)
summary(TSLS2)

## 
## Call:
## lm(formula = Y ~ Xhat, data = dat1)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -5.477 -1.017  0.004  1.018  5.287 
## 
## Coefficients:
##                           Estimate             Std. Error t value Pr(>|t|)   
## (Intercept) 0.00000000000000000429 0.03297416855661412793    0.00   1.0000   
## Xhat        0.31599999999999939249 0.10440084815354150338    3.03   0.0025 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.5 on 1998 degrees of freedom
## Multiple R-squared:  0.00456,    Adjusted R-squared:  0.00407 
## F-statistic: 9.16 on 1 and 1998 DF,  p-value: 0.0025

# In the specialized ivreg package the syntax would be:
# library(ivreg)
# TSLS2=ivreg(Y~X|I,data=dat1)

Q3: Check out the estimate for Xhat in the previous summary and compare to unrel estimate. Did MR-DoC estimated same values as 2sls? ANSWER: Yes, the estimates are equal (0.316) Now let’s check the power to reject the hypothesis of g1=0 using the non-centrality parameter for related and unrelated individuals.

lambdam1=mxCompare(m1,m2)[2,7]
dfs=mxCompare(m1,m2)[2,8]
alpha=0.05
ca=qchisq(alpha,dfs,ncp=0,lower.tail=F)
powerm1=pchisq(ca,dfs,ncp=lambdam1,lower.tail=F)
powerm1

## [1] 1

lambdaunrel=mxCompare(unrel,unrel2)[2,7]
dfs=mxCompare(unrel,unrel2)[2,8]
alpha=0.05  # user specified: type I error prob.
ca=qchisq(alpha,dfs,ncp=0,lower.tail=F)
powerUnrel=pchisq(ca,dfs,ncp=lambdaunrel,lower.tail=F)
powerUnrel

## [1] 0.86

Q4: Which method had better power to detect the causal effect? ANSWER: MR-DoC has a power of 0.99, 2sls has a power of 0.86 Scenario 2: Pleiotropy ————————————————- Next consider the scenario with pleiotropy, assume re=0 and test g1 = 0 Check the results: does MR-DoC model recover correctly the parameters b2, g1, and b1. Do we detect a causal effect if we don’t account for pleiotropy (SEM, 2stage least squares, 2-sample MR)?

sim_data2 <-  mrdoc1 |>
  omxSetParameters(labels =  c("g1","b1", "b2",
                               "ax2", "ay2", "cx2", "cy2", "ex2", "ey2",
                               "covA", "covC", "covE","sigma_x"),
                   values = c(0.143, 0.316, 0.127,
                              0.424, 0.67, 0.670, 0.424, 0.519, 0.519,
                              0.411,0.221,0, 1)) |>
  mxGenerateData( nrows = 1000, empirical = TRUE)

sim_data2$MZ <- mutate(sim_data2$MZ, iX_T2 = iX_T1)

dwpmz <- sim_data2$MZ
dwpdz <- sim_data2$DZ

dim(dwpdz) #

## [1] 1000    6

head(dwpdz)

##     X_T1  Y_T1 iX_T1 X_T2   Y_T2 iX_T2
## 1 -0.035 -0.27  1.03 -1.1  1.461  1.16
## 2  0.481  1.56  1.29  1.5  0.156  2.26
## 3 -1.018 -1.55 -0.65 -1.4  0.011 -0.76
## 4  1.193 -0.93 -0.64  2.0 -0.079  0.68
## 5  1.600  2.11  0.62 -1.3 -0.314  1.37
## 6  0.050 -0.89 -0.50  1.4  0.766  0.80

summary(dwpdz)

##       X_T1           Y_T1          iX_T1            X_T2           Y_T2          iX_T2      
##  Min.   :-3.9   Min.   :-4.2   Min.   :-3.06   Min.   :-4.1   Min.   :-4.2   Min.   :-3.01  
##  1st Qu.:-0.9   1st Qu.:-0.9   1st Qu.:-0.68   1st Qu.:-0.9   1st Qu.:-0.9   1st Qu.:-0.66  
##  Median : 0.0   Median : 0.0   Median : 0.00   Median : 0.0   Median : 0.0   Median :-0.05  
##  Mean   : 0.0   Mean   : 0.0   Mean   : 0.00   Mean   : 0.0   Mean   : 0.0   Mean   : 0.00  
##  3rd Qu.: 0.9   3rd Qu.: 1.0   3rd Qu.: 0.69   3rd Qu.: 0.9   3rd Qu.: 0.9   3rd Qu.: 0.68  
##  Max.   : 4.4   Max.   : 4.3   Max.   : 2.92   Max.   : 3.8   Max.   : 4.1   Max.   : 2.96

dim(dwpmz) #

## [1] 1000    6

head(dwpmz)

##    X_T1    Y_T1  iX_T1    X_T2   Y_T2  iX_T2
## 1 -0.89 -1.5548 -0.490 -0.1428 -3.586 -0.490
## 2 -0.14  1.3320 -0.553  0.7621  0.332 -0.553
## 3 -1.22 -0.0653 -0.034 -0.0076  1.245 -0.034
## 4  0.72 -0.0053  0.166  1.6579  0.067  0.166
## 5 -0.92 -0.8504  0.171 -1.3165 -1.570  0.171
## 6 -1.78  0.9096 -1.363 -2.7041 -1.469 -1.363

summary(dwpmz)

##       X_T1           Y_T1          iX_T1           X_T2           Y_T2          iX_T2     
##  Min.   :-3.7   Min.   :-4.6   Min.   :-3.0   Min.   :-4.0   Min.   :-4.1   Min.   :-3.0  
##  1st Qu.:-0.9   1st Qu.:-1.0   1st Qu.:-0.6   1st Qu.:-0.9   1st Qu.:-1.0   1st Qu.:-0.6  
##  Median :-0.1   Median : 0.0   Median : 0.0   Median : 0.0   Median : 0.0   Median : 0.0  
##  Mean   : 0.0   Mean   : 0.0   Mean   : 0.0   Mean   : 0.0   Mean   : 0.0   Mean   : 0.0  
##  3rd Qu.: 0.9   3rd Qu.: 0.9   3rd Qu.: 0.6   3rd Qu.: 0.9   3rd Qu.: 0.9   3rd Qu.: 0.6  
##  Max.   : 4.6   Max.   : 4.4   Max.   : 3.7   Max.   : 4.7   Max.   : 4.2   Max.   : 3.7

pleio = mrdoc1
pleio$MZ <- mrdoc1$MZ + mxData(dwpmz, type = "raw")
pleio$DZ <- mrdoc1$DZ + mxData(dwpdz, type = "raw")

pleio <- mxRun(pleio)

## Running mrdoc1 with 15 parameters

summary(pleio)

## Summary of mrdoc1 
##  
## free parameters:
##       name      matrix row   col    Estimate Std.Error A
## 1       g1      top.BE   Y     X  0.14300289     0.031  
## 2       b1      top.BE   X    iX  0.31599910     0.022  
## 3       b2      top.BE   Y    iX  0.12700095     0.025  
## 4      ax2       top.A   X     X  0.42359100     0.077 !
## 5     covA       top.A   X     Y  0.41058422     0.067 !
## 6      ay2       top.A   Y     Y  0.66931525     0.090 !
## 7  sigma_x       top.A  iX    iX  0.99900149     0.026  
## 8      cx2       top.C   X     X  0.66932329     0.074 !
## 9     covC       top.C   X     Y  0.22077728     0.054 !
## 10     cy2       top.C   Y     Y  0.42358333     0.080 !
## 11     ex2       top.E   X     X  0.51847605     0.023  
## 12     ey2       top.E   Y     Y  0.51848682     0.023 !
## 13     mX1 top.mean_dz   1  X_T1  0.00000085     0.026  
## 14     mY2 top.mean_dz   1  Y_T1 -0.00000048     0.027  
## 15    miX1 top.mean_dz   1 iX_T1 -0.00000751     0.021 !
## 
## Model Statistics: 
##                |  Parameters  |  Degrees of Freedom  |  Fit (-2lnL units)
##        Model:             15                  10985                 32379
##    Saturated:             NA                     NA                    NA
## Independence:             NA                     NA                    NA
## Number of observations/statistics: 2000/11000
## 
## Information Criteria: 
##       |  df Penalty  |  Parameters Penalty  |  Sample-Size Adjusted
## AIC:          10409                  32409                    32409
## BIC:         -51117                  32493                    32445
## CFI: NA 
## TLI: 1   (also known as NNFI) 
## RMSEA:  0  [95% CI (NA, NA)]
## Prob(RMSEA <= 0.05): NA
## To get additional fit indices, see help(mxRefModels)
## timestamp: 2024-03-27 14:48:51 
## Wall clock time: 0.14 secs 
## optimizer:  SLSQP 
## OpenMx version number: 2.20 
## Need help?  See help(mxSummary)

parameters used for simulation g1 = 0.143 b1 = 0.316 b2 = 0.127 Q5: check the results: does MR-DoC model recover correctly the parameters b2, g1, and b1? Hint: look at the true values used for simulation ANSWER: Yes. MR-DoC: Test g1=0 using a likelihood ratio test #########

pleio2 <- omxSetParameters(name = "no_causal", pleio, label="g1", free = FALSE,
                           values = 0)
pleio2 <- mxRun(pleio2)

## Running no_causal with 14 parameters

mxCompare(pleio, pleio2)

##     base comparison ep minus2LL    df   AIC diffLL diffdf         p
## 1 mrdoc1       <NA> 15    32379 10985 32409     NA     NA        NA
## 2 mrdoc1  no_causal 14    32400 10986 32428     21      1 0.0000043

Let’s run in unrelateds: —————————————————

# take twin 1 from mz and dz
dat2 <- rbind(dwpmz[,1:3],dwpdz[,1:3])|>
  rename(X = X_T1,
         Y = Y_T1,
         I = iX_T1)


unrel3 <- IVModel + mxData(dat2, type = "raw")
# in the interest of brevity, let's autostart the model
unrel3 <- mxAutoStart(unrel3)
unrel3 <- mxRun(unrel3)

## Running MR with 9 parameters

summary(unrel3)

## Summary of MR 
##  
## free parameters:
##   name matrix row col    Estimate Std.Error A
## 1   g1      A   Y   X  0.54484392     0.083 !
## 2   b1      A   X   I  0.31600496     0.028 !
## 3   vI      S   I   I  0.99903761     0.032 !
## 4   vX      S  eX  eX  1.61131910     0.051 !
## 5   re      S  eX  eY -0.01613821     0.137 !
## 6   vY      S  eY  eY  1.36412244     0.043 !
## 7   mX      M   1   X -0.00000021     0.028  
## 8   mY      M   1   Y -0.00000017     0.026  
## 9   mI      M   1   I  0.00000305     0.022 !
## 
## Model Statistics: 
##                |  Parameters  |  Degrees of Freedom  |  Fit (-2lnL units)
##        Model:              9                   5991                 18600
##    Saturated:              9                   5991                    NA
## Independence:              6                   5994                    NA
## Number of observations/statistics: 2000/6000
## 
## Information Criteria: 
##       |  df Penalty  |  Parameters Penalty  |  Sample-Size Adjusted
## AIC:           6618                  18618                    18618
## BIC:         -26937                  18669                    18640
## CFI: NA 
## TLI: 1   (also known as NNFI) 
## RMSEA:  0  [95% CI (NA, NA)]
## Prob(RMSEA <= 0.05): NA
## To get additional fit indices, see help(mxRefModels)
## timestamp: 2024-03-27 14:48:52 
## Wall clock time: 0.053 secs 
## optimizer:  SLSQP 
## OpenMx version number: 2.20 
## Need help?  See help(mxSummary)

parameters used for simulation g1 = 0.143 b1 = 0.316 b2 = 0.127 Q6: Does the 2sls model recover correctly the parameters used for simulation? ANSWER: No, it overestimated g1

unrel4 <- omxSetParameters(name = "no_causal", unrel3, label="g1", free = FALSE,
                           values = 0)
unrel4 <- mxRun(unrel4)

## Running no_causal with 8 parameters

mxCompare(unrel3,unrel4)

##   base comparison ep minus2LL   df   AIC diffLL diffdf           p
## 1   MR       <NA>  9    18600 5991 18618     NA     NA          NA
## 2   MR  no_causal  8    18633 5992 18649     32      1 0.000000014

## Which model has highest power? (look at chisq difference) -----------------

## MR-DoC in Twins
chisq_Twins=mxCompare(pleio,pleio2)[2,7]
chisq_Twins

## [1] 21

## MR-SEM in unrelateds
chisq_Unrel=mxCompare(unrel3, unrel4)[2,7]
chisq_Unrel

## [1] 32

Q7: Conclusion? Larger diff, higher power ANSWER: Power in the model with unrelated was higher.

##  Fit the model using two stage least squares     ####################
TSLS1=lm(X~I,data=dat2)
Xhat=predict(TSLS1)
TSLS2=lm(Y~Xhat,data=dat2)
summary(TSLS2)

## 
## Call:
## lm(formula = Y ~ Xhat, data = dat2)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -4.622 -0.922 -0.037  0.920  4.710 
## 
## Coefficients:
##                          Estimate            Std. Error t value    Pr(>|t|)    
## (Intercept) 0.0000000000000000135 0.0302219807176829502    0.00           1    
## Xhat        0.5448987341772134618 0.0956870349706873818    5.69 0.000000014 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.4 on 1998 degrees of freedom
## Multiple R-squared:  0.016,  Adjusted R-squared:  0.0155 
## F-statistic: 32.4 on 1 and 1998 DF,  p-value: 0.0000000142

Check above that the 2sls test using either glm or SEM still finds same estimates. Q8: Do we detect a causal effect if we account for pleiotropy (SEM, 2stage least squares)? ANSWER: The models using unrelated detects a causal effect but not accounting for pleiotropy the estimate was biased. Scenario 3. Bidirectional causation ——————————————- Specifying the mrdoc2 model. This is similar to mrdoc1 except larger matrices (one more instrument) Matrix for causal paths

BE <-  mxMatrix(name = "BE", type = "Full",nrow=4,  ncol = 4,byrow = TRUE,
                labels = c(NA,   "g2", "b1", "b4",
                           "g1", NA,   "b2", "b3",
                           NA,   NA,   NA,   NA,
                           NA,   NA,   NA,   NA),
                free = c(FALSE, TRUE, TRUE, FALSE,
                         TRUE, FALSE, FALSE, TRUE,
                         FALSE, FALSE, FALSE, FALSE,
                         FALSE, FALSE, FALSE, FALSE),
                dimnames = list(c("X", "Y", "iX", "iY"),
                                c("X", "Y", "iX", "iY")))

A, C and E decomposition

A <- mxMatrix(name = 'A', type='Symm', nrow=4, ncol = 4,byrow = TRUE,
              labels=c("ax2","covA",NA,NA,
                       "covA","ay2",NA,NA,
                       NA,NA,"x2"  ,"rf",
                       NA,NA,"rf","y2"),
              free=c(TRUE,TRUE,FALSE,FALSE,
                     TRUE,TRUE,FALSE,FALSE,
                     FALSE,FALSE,TRUE,TRUE,
                     FALSE,FALSE,TRUE,TRUE),
              dimnames = list(c("X", "Y", "iX", "iY"),
                              c("X", "Y", "iX", "iY")))

C <-  mxMatrix(name = 'C', type='Symm',nrow=4, ncol = 4,byrow = TRUE,
               labels =c("cx2", "covC",NA,NA,
                         "covC","cy2" ,NA,NA,
                         NA,    NA,    NA,NA,
                         NA,    NA,    NA,NA),
               free=c(TRUE,TRUE,  FALSE,FALSE,
                      TRUE,TRUE,  FALSE,FALSE,
                      FALSE,FALSE,FALSE,FALSE,
                      FALSE,FALSE,FALSE,FALSE),
               dimnames = list(c("X", "Y", "iX", "iY"),
                               c("X", "Y", "iX", "iY")))

E <-  mxMatrix(name = 'E', type='Symm', nrow=4, ncol = 4,byrow = TRUE,
               labels =c("ex2", "covE",NA,NA,
                         "covE","ey2" ,NA,NA,
                         NA,    NA,    NA,NA,
                         NA,    NA,    NA,NA),
               free= c(TRUE,TRUE,  FALSE,FALSE,
                       TRUE,TRUE,  FALSE,FALSE,
                       FALSE,FALSE,FALSE,FALSE,
                       FALSE,FALSE,FALSE,FALSE),
               dimnames = list(c("X", "Y", "iX", "iY"),
                               c("X", "Y", "iX", "iY")))

A filter matrix, as we need to remove the PRSs from twin 2

filter <-  mxMatrix(name = 'filter', type='Full', nrow=6, ncol=8, free=FALSE,
                    byrow = TRUE,
                    values=c(1,0,0,0,0,0,0,0,
                             0,1,0,0,0,0,0,0,
                             0,0,1,0,0,0,0,0,
                             0,0,0,1,0,0,0,0,
                             0,0,0,0,1,0,0,0,
                             0,0,0,0,0,1,0,0),
                    dimnames = list(c("X_T1", "Y_T1", "iX_T1","iY_T1",
                                      "X_T2", "Y_T2"),
                                    c("X_T1", "Y_T1", "iX_T1","iY_T1","X_T2",
                                      "Y_T2", "iX_T2","iY_T2")))

LY <-  mxMatrix(name = 'LY', type='Full',nrow=4, ncol = 4, free = FALSE,
                values = diag(4), labels = NA,
                dimnames = list(c("X", "Y", "iX", "iY"),
                                c("X", "Y", "iX", "iY")))

The object with the means

mean_dz <-  mxMatrix(name = 'mean_dz', type='Full', nrow=1, ncol=8, free=TRUE,
                     byrow = TRUE, values = 0, labels=c('mX1','mY2','miX1','miY2',
                                                        'mX1','mY2','miX1','miY2') )

Removing the PRS from the MZ means

algebras <- list(  mxAlgebra('mean_mz', expression = mean_dz %*% t(filter)),
                   # Identity matrix to algebra calculations
                   mxMatrix(name = 'I', type='Iden', nrow= 4,ncol= 4 ),
                   # The needed matrices for calculating the variances
                   mxAlgebra('A_'  , expression =  LY %&% solve(I - BE) %&% A),
                   mxAlgebra('C_'  , expression =  LY %&%solve(I - BE) %&% C),
                   mxAlgebra('E_'  , expression =  LY %&%solve(I - BE) %&% E),
                   mxAlgebra('full_variance' , expression= A_ + C_ + E_),
                   mxAlgebra('variance_mz_', expression=rbind(
                     cbind(full_variance, A_ + C_),
                     cbind(A_ + C_, full_variance))),
                   mxAlgebra('variance_dz', expression=rbind(
                     cbind(full_variance, 0.5%x%A_ + C_),
                     cbind(0.5%x%A_ + C_, full_variance))),
                   mxAlgebra('variance_mz', expression= filter%&%variance_mz_))

top_mr2 <- mxModel("top", BE, A, C, E, filter, LY, mean_dz,  algebras)

Preparing the objects for the multiple groups (MZ, DZ) analysis

MZ_mr2 = mxModel("MZ", mxFitFunctionML(),
                 mxExpectationNormal(covariance = "top.variance_mz",means = "top.mean_mz",
                                     dimnames =  c("X_T1", "Y_T1", "iX_T1", "iY_T1",
                                                   "X_T2", "Y_T2")))

DZ_mr2 = mxModel("DZ", mxFitFunctionML(),
                 mxExpectationNormal(covariance = "top.variance_dz", means = "top.mean_dz",
                                     dimnames =  c("X_T1", "Y_T1", "iX_T1", "iY_T1",
                                                   "X_T2", "Y_T2", "iX_T2", "iY_T2")))

Combining objects to generate the final model

mrdoc2 = mxModel("mrdoc2", top_mr2, MZ_mr2, DZ_mr2,
                 mxFitFunctionMultigroup(c("MZ","DZ") ) )

Simulating using mrdoc2, now we are simulating data so that there is an effect of X on Y and vice-versa. g1, g2 !=0

sim_data3 <- mrdoc2 |>
  omxSetParameters(labels =  c("g1","g2", "b1", "b3",
                               "ax2", "ay2", "cx2", "cy2", "ex2", "ey2",
                               "covA", "covC", "covE","rf", "x2", "y2"),
                   values = c(0.184, 0.111, 0.411, 0.519,
                              0.424, 0.670, 0.670, 0.424, 0.519, 0.519,
                              0.411,0.221,0.221,0.111, 1,1)) |>
  mxGenerateData( nrows = 1000, empirical = TRUE)

Remember, we need to duplicate the columns for the second twin

sim_data3$MZ <- mutate(sim_data3$MZ, iX_T2 = iX_T1, iY_T2 = iY_T1)

dg2mz <- sim_data3$MZ
dg2dz <- sim_data3$DZ


dim(dg2dz) #

## [1] 1000    8

head(dg2dz)

##   X_T1  Y_T1 iX_T1 iY_T1  X_T2   Y_T2 iX_T2 iY_T2
## 1 1.09  0.83  0.49 -0.84  2.06  1.510  0.51  -1.2
## 2 0.75 -0.86  1.13 -1.56 -0.40 -1.701 -0.11  -1.6
## 3 1.24  1.41 -0.85 -0.65 -2.17  1.506 -1.24   1.1
## 4 2.16  2.24 -1.12  0.26  1.53  1.849 -0.55   1.2
## 5 0.29  0.40 -0.45  0.68 -0.65  0.032 -1.81   1.0
## 6 2.65  2.96 -0.69 -0.67  2.66  1.472  0.78  -1.6

summary(dg2dz)

##       X_T1           Y_T1          iX_T1          iY_T1           X_T2           Y_T2          iX_T2     
##  Min.   :-5.6   Min.   :-5.0   Min.   :-3.5   Min.   :-2.8   Min.   :-5.0   Min.   :-4.5   Min.   :-2.9  
##  1st Qu.:-1.0   1st Qu.:-1.0   1st Qu.:-0.6   1st Qu.:-0.7   1st Qu.:-1.0   1st Qu.:-1.1   1st Qu.:-0.7  
##  Median : 0.0   Median :-0.1   Median : 0.0   Median : 0.0   Median : 0.0   Median : 0.0   Median : 0.0  
##  Mean   : 0.0   Mean   : 0.0   Mean   : 0.0   Mean   : 0.0   Mean   : 0.0   Mean   : 0.0   Mean   : 0.0  
##  3rd Qu.: 1.0   3rd Qu.: 1.1   3rd Qu.: 0.7   3rd Qu.: 0.6   3rd Qu.: 0.9   3rd Qu.: 1.1   3rd Qu.: 0.7  
##  Max.   : 4.3   Max.   : 4.5   Max.   : 3.0   Max.   : 3.4   Max.   : 4.7   Max.   : 4.5   Max.   : 3.2  
##      iY_T2      
##  Min.   :-3.12  
##  1st Qu.:-0.64  
##  Median : 0.00  
##  Mean   : 0.00  
##  3rd Qu.: 0.67  
##  Max.   : 3.08

dim(dg2mz) #

## [1] 1000    8

head(dg2mz)

##    X_T1   Y_T1  iX_T1  iY_T1  X_T2   Y_T2  iX_T2  iY_T2
## 1  1.59  1.517 -0.498  0.330  2.19  1.234 -0.498  0.330
## 2  0.40 -0.608 -0.982 -0.358  0.71 -0.108 -0.982 -0.358
## 3 -0.39 -1.330  0.519 -0.828 -2.66 -3.425  0.519 -0.828
## 4  1.14  0.557  0.686  0.709 -0.17  0.517  0.686  0.709
## 5  0.60  0.081  2.273 -0.552  0.37  0.406  2.273 -0.552
## 6 -0.28 -1.163  0.085  0.033  0.17 -0.097  0.085  0.033

summary(dg2mz)

##       X_T1           Y_T1          iX_T1           iY_T1           X_T2           Y_T2          iX_T2      
##  Min.   :-4.9   Min.   :-5.3   Min.   :-2.72   Min.   :-4.0   Min.   :-4.6   Min.   :-5.2   Min.   :-2.72  
##  1st Qu.:-0.9   1st Qu.:-1.1   1st Qu.:-0.72   1st Qu.:-0.6   1st Qu.:-0.9   1st Qu.:-1.1   1st Qu.:-0.72  
##  Median : 0.0   Median : 0.0   Median :-0.04   Median : 0.0   Median : 0.0   Median : 0.0   Median :-0.04  
##  Mean   : 0.0   Mean   : 0.0   Mean   : 0.00   Mean   : 0.0   Mean   : 0.0   Mean   : 0.0   Mean   : 0.00  
##  3rd Qu.: 1.0   3rd Qu.: 1.0   3rd Qu.: 0.70   3rd Qu.: 0.7   3rd Qu.: 1.0   3rd Qu.: 1.0   3rd Qu.: 0.70  
##  Max.   : 5.3   Max.   : 5.0   Max.   : 2.97   Max.   : 2.9   Max.   : 5.6   Max.   : 4.3   Max.   : 2.97  
##      iY_T2     
##  Min.   :-4.0  
##  1st Qu.:-0.6  
##  Median : 0.0  
##  Mean   : 0.0  
##  3rd Qu.: 0.7  
##  Max.   : 2.9

bidir <- mrdoc2
bidir$MZ <- bidir$MZ + mxData(dg2mz, type = "raw")
bidir$DZ <- bidir$DZ + mxData(dg2dz, type = "raw")

bidir <- mxAutoStart(bidir)
bidir <- mxRun(bidir)

## Running mrdoc2 with 20 parameters

You can check for local identification: (this is slow, do this at home) mxCheckIdentification(bidir)$status

summary(bidir)

## Summary of mrdoc2 
##  
## free parameters:
##    name      matrix row   col    Estimate Std.Error A
## 1    g1      top.BE   Y     X  0.18404298     0.054 !
## 2    g2      top.BE   X     Y  0.11099381     0.042 !
## 3    b1      top.BE   X    iX  0.41099670     0.023 !
## 4    b3      top.BE   Y    iY  0.51899828     0.023 !
## 5   ax2       top.A   X     X  0.42357732     0.088 !
## 6  covA       top.A   X     Y  0.41058220     0.076 !
## 7   ay2       top.A   Y     Y  0.66934882     0.101 !
## 8    x2       top.A  iX    iX  0.99894780     0.026 !
## 9    rf       top.A  iX    iY  0.11087160     0.018 !
## 10   y2       top.A  iY    iY  0.99894780     0.026 !
## 11  cx2       top.C   X     X  0.66931127     0.080 !
## 12 covC       top.C   X     Y  0.22074242     0.074 !
## 13  cy2       top.C   Y     Y  0.42355736     0.085 !
## 14  ex2       top.E   X     X  0.51847349     0.036 !
## 15 covE       top.E   X     Y  0.22076944     0.041  
## 16  ey2       top.E   Y     Y  0.51848643     0.038 !
## 17  mX1 top.mean_dz   1  X_T1 -0.00000019     0.029  
## 18  mY2 top.mean_dz   1  Y_T1  0.00000018     0.031  
## 19 miX1 top.mean_dz   1 iX_T1  0.00000012     0.021  
## 20 miY2 top.mean_dz   1 iY_T1 -0.00000015     0.021  
## 
## Model Statistics: 
##                |  Parameters  |  Degrees of Freedom  |  Fit (-2lnL units)
##        Model:             20                  13980                 40029
##    Saturated:             NA                     NA                    NA
## Independence:             NA                     NA                    NA
## Number of observations/statistics: 2000/14000
## 
## Information Criteria: 
##       |  df Penalty  |  Parameters Penalty  |  Sample-Size Adjusted
## AIC:          12069                  40069                    40070
## BIC:         -66231                  40181                    40118
## CFI: NA 
## TLI: 1   (also known as NNFI) 
## RMSEA:  0  [95% CI (NA, NA)]
## Prob(RMSEA <= 0.05): NA
## To get additional fit indices, see help(mxRefModels)
## timestamp: 2024-03-27 14:48:53 
## Wall clock time: 0.15 secs 
## optimizer:  SLSQP 
## OpenMx version number: 2.20 
## Need help?  See help(mxSummary)

simulated parameters g1 = 0.184 g2 = 0.111 b1 = 0.411 b3 = 0.519 Q9: Check the results: does MR-DoC model recover correctly the parameters g1, g2, b1, and b3 ANSWER: Yes, the estimates matches the true values set for the data. MR-DoC: Test g1=0 using a likelihood ratio test

bidir2 <- omxSetParameters(name = "drop_g1g2", bidir, label=c("g1", "g2"),
                           free = FALSE, values = 0)
bidir2 <-  mxRun(bidir2)

## Running drop_g1g2 with 18 parameters

mxCompare(bidir, bidir2)

##     base comparison ep minus2LL    df   AIC diffLL diffdf       p
## 1 mrdoc2       <NA> 20    40029 13980 40069     NA     NA      NA
## 2 mrdoc2  drop_g1g2 18    40045 13982 40081     16      2 0.00032

Q10: How many degrees of freedom for this LRT? ANSWER: Two degrees of freedom, we dropped both g1 and g2. Q11: Are the causal paths significant? ANSWER: Dropping g1 and g2 resulted in a significantly worse model, therefore the bidirectional relationship is significant.

## Let's run in unrelateds X -> Y: ---------------------------------------------

We are proceeding to check if running 2sls in each direction matches the true values we set in the simulation. Remember the model includes indirect pleiotropy of type PS2 -> rf -> PS1 -> X

# take twin 1 from mz and dz
dat4=rbind(dg2mz[,1:4],dg2dz[,1:4]) |>
  rename(X = X_T1,
         Y = Y_T1,
         I = iX_T1)

unrel7 <- IVModel + mxData(dat4, type = "raw")
# in the interest of brevity, let's autostart the model
unrel7 <- mxAutoStart(unrel7)
unrel7 <- mxRun(unrel7)

## Running MR with 9 parameters

summary(unrel7)

## Summary of MR 
##  
## free parameters:
##   name matrix row col    Estimate Std.Error A
## 1   g1      A   Y   X  0.31920390     0.067  
## 2   b1      A   X   I  0.42609788     0.031  
## 3   vI      S   I   I  0.99899914     0.032  
## 4   vX      S  eX  eX  1.90053222     0.060  
## 5   re      S  eX  eY  0.82566402     0.134  
## 6   vY      S  eY  eY  1.61915635     0.122  
## 7   mX      M   1   X -0.00000031     0.031  
## 8   mY      M   1   Y -0.00000026     0.028  
## 9   mI      M   1   I -0.00000001     0.022  
## 
## Model Statistics: 
##                |  Parameters  |  Degrees of Freedom  |  Fit (-2lnL units)
##        Model:              9                   5991                 18772
##    Saturated:              9                   5991                    NA
## Independence:              6                   5994                    NA
## Number of observations/statistics: 2000/6000
## 
## Information Criteria: 
##       |  df Penalty  |  Parameters Penalty  |  Sample-Size Adjusted
## AIC:           6790                  18790                    18791
## BIC:         -26765                  18841                    18812
## CFI: NA 
## TLI: 1   (also known as NNFI) 
## RMSEA:  0  [95% CI (NA, NA)]
## Prob(RMSEA <= 0.05): NA
## To get additional fit indices, see help(mxRefModels)
## timestamp: 2024-03-27 14:48:54 
## Wall clock time: 0.039 secs 
## optimizer:  SLSQP 
## OpenMx version number: 2.20 
## Need help?  See help(mxSummary)

## Let's run in unrelateds Y -> X: ---------------------------------------------

# take twin 1 from mz and dz
dat5=rbind(dg2mz[,1:4],dg2dz[,1:4]) |>
  rename(Y = X_T1,  # notice the inversion here
         X = Y_T1,  # notice the inversion here
         I = iY_T1)

The inversion above is only necessary so we don’t have to rewrite the model from scratch. Bear with me as I rename the parameters in the base model This will help interpreting results

IVModelYX = omxSetParameters(name = "IVModelYX", IVModel, label = c("g1","b1", "vX", "vY", "mX", "mY"),
                             newlabel = c("g2","b3", "vY", "vX", "mY", "mX"))

unrel8 <- IVModelYX + mxData(dat5, type = "raw")
# in the interest of brevity, let's autostart the model
unrel8 <- mxAutoStart(unrel8)
unrel8 <- mxRun(unrel8)

## Running IVModelYX with 9 parameters

summary(unrel8)

## Summary of IVModelYX 
##  
## free parameters:
##   name matrix row col     Estimate Std.Error A
## 1   g2      A   Y   X 0.1957359896     0.052  
## 2   b3      A   X   I 0.5383903493     0.032  
## 3   vI      S   I   I 0.9990000291     0.032  
## 4   vY      S  eX  eX 2.0688193334     0.065  
## 5   re      S  eX  eY 1.0285950507     0.118  
## 6   vX      S  eY  eY 1.5888873870     0.119  
## 7   mY      M   1   X 0.0000005475     0.032  
## 8   mX      M   1   Y 0.0000004254     0.028  
## 9   mI      M   1   I 0.0000000055     0.022  
## 
## Model Statistics: 
##                |  Parameters  |  Degrees of Freedom  |  Fit (-2lnL units)
##        Model:              9                   5991                 18628
##    Saturated:              9                   5991                    NA
## Independence:              6                   5994                    NA
## Number of observations/statistics: 2000/6000
## 
## Information Criteria: 
##       |  df Penalty  |  Parameters Penalty  |  Sample-Size Adjusted
## AIC:           6646                  18646                    18647
## BIC:         -26909                  18697                    18668
## CFI: NA 
## TLI: 1   (also known as NNFI) 
## RMSEA:  0  [95% CI (NA, NA)]
## Prob(RMSEA <= 0.05): NA
## To get additional fit indices, see help(mxRefModels)
## timestamp: 2024-03-27 14:48:54 
## Wall clock time: 0.04 secs 
## optimizer:  SLSQP 
## OpenMx version number: 2.20 
## Need help?  See help(mxSummary)

The result of the summary immediately above is of the relationship of Y to X, in other words g2. simulated parameters g1 = 0.184 g2 = 0.111 b1 = 0.411 b3 = 0.519 Q12: Does MR-SEM models (unrel7 or X->Y, and model unrel8 Y <- X) recover correctly the parameters used for simulation? What happened to the estimates? ANSWER: Both g1 and g2 were overestimated in the 2sls solution. Scenario 4: Pleiotropy & no causal effect STRETCH GOAL! ——————- We are back to MR-DoC1 and we will be simulating data without the causal effect Compare the results obtained with MR-DoC, unrelateds SEM, 2SLS, 2-sample MR.

sim_data4 <-  mrdoc1 |>
  omxSetParameters(labels =  c("g1","b1", "b2",
                               "ax2", "ay2", "cx2", "cy2", "ex2", "ey2",
                               "covA", "covC", "covE","sigma_x"),
                   values = c(0, 0.316, 0.316,
                              0.424, 0.670, 0.670, 0.424, 0.519, 0.519,
                              0.411,0.221,0, 1)) |>
  mxGenerateData( nrows = 1000, empirical = TRUE)

sim_data4$MZ <-  mutate(sim_data4$MZ, iX_T2 = iX_T1)

dg1mz <- sim_data4$MZ
dg1dz <- sim_data4$DZ

dim(dg1dz) #

## [1] 1000    6

head(dg1dz)

##      X_T1  Y_T1 iX_T1  X_T2  Y_T2 iX_T2
## 1  1.4663  3.46 -0.58  0.94  1.08 -0.34
## 2  1.1069  0.95 -0.82  0.27 -0.72 -0.30
## 3  0.0035 -0.99 -1.19  0.13 -0.92 -0.77
## 4  1.2180  2.96  2.56  0.69 -0.87 -0.13
## 5  0.2032  1.14 -0.32  1.35  0.17 -0.18
## 6 -0.5218 -0.32  0.42 -1.66 -0.60 -1.00

summary(dg1dz)

##       X_T1           Y_T1          iX_T1            X_T2           Y_T2          iX_T2     
##  Min.   :-4.4   Min.   :-3.9   Min.   :-3.05   Min.   :-4.3   Min.   :-4.1   Min.   :-3.2  
##  1st Qu.:-0.9   1st Qu.:-0.9   1st Qu.:-0.69   1st Qu.:-0.9   1st Qu.:-0.9   1st Qu.:-0.7  
##  Median : 0.0   Median : 0.0   Median : 0.03   Median : 0.0   Median : 0.0   Median : 0.0  
##  Mean   : 0.0   Mean   : 0.0   Mean   : 0.00   Mean   : 0.0   Mean   : 0.0   Mean   : 0.0  
##  3rd Qu.: 0.9   3rd Qu.: 0.9   3rd Qu.: 0.68   3rd Qu.: 0.9   3rd Qu.: 0.9   3rd Qu.: 0.7  
##  Max.   : 5.0   Max.   : 3.5   Max.   : 2.79   Max.   : 4.0   Max.   : 4.1   Max.   : 3.2

dim(dg1mz) #

## [1] 1000    6

head(dg1mz)

##    X_T1  Y_T1 iX_T1  X_T2   Y_T2 iX_T2
## 1  1.22  0.95  0.12  1.50  1.355  0.12
## 2  1.12  1.18 -0.78  0.70  0.932 -0.78
## 3  0.68  0.56  0.32  1.88  0.046  0.32
## 4  0.23 -1.34  0.50  0.45 -1.950  0.50
## 5 -0.37 -0.20  0.33  1.70 -2.129  0.33
## 6 -0.23 -1.57  1.10 -0.43 -0.576  1.10

summary(dg1mz)

##       X_T1           Y_T1          iX_T1           X_T2           Y_T2          iX_T2     
##  Min.   :-4.5   Min.   :-4.7   Min.   :-3.1   Min.   :-3.9   Min.   :-4.4   Min.   :-3.1  
##  1st Qu.:-0.9   1st Qu.:-0.9   1st Qu.:-0.7   1st Qu.:-0.9   1st Qu.:-0.8   1st Qu.:-0.7  
##  Median : 0.1   Median : 0.0   Median : 0.0   Median : 0.0   Median : 0.0   Median : 0.0  
##  Mean   : 0.0   Mean   : 0.0   Mean   : 0.0   Mean   : 0.0   Mean   : 0.0   Mean   : 0.0  
##  3rd Qu.: 0.9   3rd Qu.: 0.9   3rd Qu.: 0.7   3rd Qu.: 0.9   3rd Qu.: 0.9   3rd Qu.: 0.7  
##  Max.   : 4.1   Max.   : 4.0   Max.   : 3.5   Max.   : 4.6   Max.   : 4.2   Max.   : 3.5

no_causal <- mrdoc1
no_causal$MZ <- mrdoc1$MZ + mxData(dg1mz, type = "raw")
no_causal$DZ <- mrdoc1$DZ + mxData(dg1dz, type = "raw")

no_causal <- mxRun(no_causal)

## Running mrdoc1 with 15 parameters

summary(no_causal)

## Summary of mrdoc1 
##  
## free parameters:
##       name      matrix row   col    Estimate Std.Error A
## 1       g1      top.BE   Y     X -0.00000169     0.031  
## 2       b1      top.BE   X    iX  0.31600043     0.022  
## 3       b2      top.BE   Y    iX  0.31600035     0.025  
## 4      ax2       top.A   X     X  0.42356291     0.077  
## 5     covA       top.A   X     Y  0.41059142     0.067  
## 6      ay2       top.A   Y     Y  0.66932379     0.090 !
## 7  sigma_x       top.A  iX    iX  0.99899888     0.026  
## 8      cx2       top.C   X     X  0.66934112     0.074  
## 9     covC       top.C   X     Y  0.22078101     0.054  
## 10     cy2       top.C   Y     Y  0.42358392     0.080  
## 11     ex2       top.E   X     X  0.51848479     0.023  
## 12     ey2       top.E   Y     Y  0.51848004     0.023  
## 13     mX1 top.mean_dz   1  X_T1  0.00000296     0.026  
## 14     mY2 top.mean_dz   1  Y_T1  0.00000083     0.026  
## 15    miX1 top.mean_dz   1 iX_T1  0.00000136     0.021  
## 
## Model Statistics: 
##                |  Parameters  |  Degrees of Freedom  |  Fit (-2lnL units)
##        Model:             15                  10985                 32379
##    Saturated:             NA                     NA                    NA
## Independence:             NA                     NA                    NA
## Number of observations/statistics: 2000/11000
## 
## Information Criteria: 
##       |  df Penalty  |  Parameters Penalty  |  Sample-Size Adjusted
## AIC:          10409                  32409                    32409
## BIC:         -51117                  32493                    32445
## CFI: NA 
## TLI: 1   (also known as NNFI) 
## RMSEA:  0  [95% CI (NA, NA)]
## Prob(RMSEA <= 0.05): NA
## To get additional fit indices, see help(mxRefModels)
## timestamp: 2024-03-27 14:48:55 
## Wall clock time: 0.15 secs 
## optimizer:  SLSQP 
## OpenMx version number: 2.20 
## Need help?  See help(mxSummary)

Q13: Check the results: does MR-DoC model recover correctly the parameters b2, g1, and b1? ANSWER: Yes, it does.

##  MR-DoC: Test g1=0 using a likelihood ratio test   #########

no_causal2 <- omxSetParameters(name = "drop_g1", no_causal, label="g1",
                               free = FALSE, values = 0)
no_causal2 <- mxRun(no_causal2)

## Running drop_g1 with 14 parameters

mxCompare(no_causal, no_causal2)

##     base comparison ep minus2LL    df   AIC       diffLL diffdf  p
## 1 mrdoc1       <NA> 15    32379 10985 32409           NA     NA NA
## 2 mrdoc1    drop_g1 14    32379 10986 32407 -0.000000047      1  1

Q14: Do we detect a causal effect with a true g1 value set to zero? ANSWER: As expected, no.

## Let's run in unrelateds: ---------------------------------------------------

# take twin 1 from mz and dz
dat3 <- rbind(dg1mz[,1:3],dg1dz[,1:3])|>
  rename(X = X_T1,
         Y = Y_T1,
         I = iX_T1)

unrel5 <- IVModel + mxData(dat3, type = "raw")
# in the interest of brevity, let's autostart the model
unrel5 <- mxAutoStart(unrel5)
unrel5 <- mxRun(unrel5)

## Running MR with 9 parameters

parameters used for simulation g1 = 0 b1 = 0.316 b2 = 0.316

summary(unrel5)

## Summary of MR 
##  
## free parameters:
##   name matrix row col         Estimate Std.Error A
## 1   g1      A   Y   X  1.0000224830708     0.099  
## 2   b1      A   X   I  0.3160040126595     0.028 !
## 3   vI      S   I   I  0.9990013057179     0.032  
## 4   vX      S  eX  eX  1.6114040656051     0.051 !
## 5   re      S  eX  eY -0.9800670433984     0.166  
## 6   vY      S  eY  eY  1.9600849023070     0.204  
## 7   mX      M   1   X -0.0000046579304     0.028 !
## 8   mY      M   1   Y -0.0000042512894     0.031 !
## 9   mI      M   1   I -0.0000000000029     0.022  
## 
## Model Statistics: 
##                |  Parameters  |  Degrees of Freedom  |  Fit (-2lnL units)
##        Model:              9                   5991                 18600
##    Saturated:              9                   5991                    NA
## Independence:              6                   5994                    NA
## Number of observations/statistics: 2000/6000
## 
## Information Criteria: 
##       |  df Penalty  |  Parameters Penalty  |  Sample-Size Adjusted
## AIC:           6618                  18618                    18618
## BIC:         -26937                  18669                    18640
## CFI: NA 
## TLI: 1   (also known as NNFI) 
## RMSEA:  0  [95% CI (NA, NA)]
## Prob(RMSEA <= 0.05): NA
## To get additional fit indices, see help(mxRefModels)
## timestamp: 2024-03-27 14:48:55 
## Wall clock time: 0.04 secs 
## optimizer:  SLSQP 
## OpenMx version number: 2.20 
## Need help?  See help(mxSummary)

Q15: Does MR-SEM model recover correctly the parameters used for simulation? ANSWER: No, g1 = 0 + pleiotropy severily biased the causal path in the models that used unrelated data.

unrel6 <- omxSetParameters(name = "no_causal", unrel5, label="g1", free = FALSE,
                           values = 0)
unrel6 <- mxRun(unrel6)

## Running no_causal with 8 parameters

mxCompare(unrel5,unrel6)

##   base comparison ep minus2LL   df   AIC diffLL diffdf                               p
## 1   MR       <NA>  9    18600 5991 18618     NA     NA                              NA
## 2   MR  no_causal  8    18720 5992 18736    120      1 0.00000000000000000000000000059

##  Fit the model using two stage least squares     ####################
TSLS1=lm(X~I,data=dat3)
Xhat=predict(TSLS1)
TSLS2=lm(Y~Xhat,data=dat3)
summary(TSLS2)

## 
## Call:
## lm(formula = Y ~ Xhat, data = dat3)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -4.725 -0.880  0.026  0.831  3.737 
## 
## Coefficients:
##                           Estimate             Std. Error t value            Pr(>|t|)    
## (Intercept) 0.00000000000000000517 0.02839894364232584470     0.0                   1    
## Xhat        1.00000000000000177636 0.08991504358428256682    11.1 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.3 on 1998 degrees of freedom
## Multiple R-squared:  0.0583, Adjusted R-squared:  0.0578 
## F-statistic:  124 on 1 and 1998 DF,  p-value: <0.0000000000000002

The 2sls regression again recover the same biased estimates as MR-SEM. In the presence of pleiotropy MR-DoC outperforms the other methods.