Simulation of longitudinal datasets under different causal structures and analyses (network MR, multivariable MR and factorial MR)

The code in this repository accompanies an article about how network MR, multivariable MR and factorial perform when the longitudinal nature of data generation for Mendelian randomization studies is considered. Find more information on these methods in the following references:

Burgess S, Daniel RM, Butterworth AS, Thomspon SG, EPIC-InterAct Consortium. Network Mendelian randomization: Using genetic variants as instrumental variables to investigate mediation in causal pathways. Int J of Epidemiol. 2015;44(2):484-495.

Sanderson E, Davey Smith G, Windmeijer F, Bowden J. An examination of multivariable Mendelian randomization in the single-sample and two-sample summary data settings. Int J Epidemiol. 2018;1-25.

Burgess S, Thompson SG. Multivariable Mendelian randomization: The use of pleiotropic genetic variants to estimate causal effects. Am J of Epidemiol. 2015;181(4):251-260.

Rees JMB, Foley C, Burgess S. Factorial Mendelian randomization: using genetic variants to assess interactions. bioarXiv. 2019.

Appendix A - Simulations

All the code for the simulations and analyses in the text can be found at: https://github.com/jalabrecque/MR_multiple_exposures

Network MR simulation

The simulation was modeled on the simulations in Burgess et al 2015 (1).

\[\begin{align*} a_0 &= \alpha_{G_A}*g_A + u_1 + u_2 + \epsilon_{A_0} \\ b_0 &= \beta_{G_B}*g_B + u_1 + u_3 + \epsilon_{B_0} \\ a_1 &= \alpha_{A_0}*a_0 + \alpha_{B_0}*b_0 + u_1 + u_2 + \epsilon_{A_1} \\ b_1 &= \beta_{B_0}*b_0 + \beta_{A_0}*a_0 + \beta_{A_1}*a_1 + u_1 + u_3 + \epsilon_{B_1} \\ y_1 &= \gamma_{A_0}*a_0 + \gamma_{B_0}*b_0 + \gamma_{A_1}*a_1 + \gamma_{B_1}*b_1 + u_2 + u_3 + \epsilon_Y \\ g_A, g_B &\sim \textrm{Binomial}(2, 0.3) \textrm{ independently} \\ \alpha_{G_A}, \alpha_{G_B} u_1, u_2, u_3 &\sim \mathcal{N}(0,1) \textrm{ independently} \\ \epsilon_{A_0}, \epsilon_{A_1}, \epsilon_{B_0}, \epsilon_{B_1}, \epsilon_{Y_1} &\sim \mathcal{N}(0,1) \textrm{ independently} \end{align*}\]

where \(A_t\) represents variable \(A\) at time \(t\) and \(B_t\) represents variable \(M\) at time \(t\). \(G_A\) and \(G_B\) are modelled as biallelic genetic variants with a minor allele frequency of 0.3. \(U_1\) is a confounder of \(A\) and \(B\), \(U_2\) is a confounder of \(A\) and \(Y\) and \(U_3\) is a confounder of \(B\) and \(Y\). Both \(\alpha_{A_0}\) and \(\beta_{B_0}\) were set to 1 so the effect of the genetic variants did not vary with time. \(\alpha_{G_A}\) was set to 0.6 and \(\alpha_{G_B}\) was set to 1.2. These are higher values than in the original paper. This was done because the addition of a second time point introduced additional random variation which in turn reduced the F-statistics and explained variation. The higher \(\alpha_{G_A}\) and \(\alpha_{G_B}\) values restored the F-statistics to around 65 and the variance explained to 1.3% in each model as in the original paper. The four gray arrows in Figure which correspond to \(\beta_{A_0}\), \(\alpha_{B_0}\), \(\gamma_{A_0}\) and \(\gamma_{B_0}\) in the simulation were varied to determine whether they biased the network MR analysis. The effect of \(A_1\) was set to 1, The code for the simulations are available at https://github.com/jalabrecque/MR_multiple_exposures and the user has the ability to change all parameters in the model.

Multivariable MR and factorial MR simulations.

The simulation was modeled on the simulations in Sanderson et al 2019 (2). Although only the mediation scenario was presented in the text, the confounding, collider and pleiotropy scenarios were also simulated and the results included in Appendix A. For the multivariable MR simulations, all the interactions are set to 0.

\[\begin{align*} g_1, g_2, ..., g_{30} &\sim \textrm{Binomial}(2, 0.3) \textrm{independently} \\ \alpha_{G_{A_{1-10}}} &= (0.05,0.10,0.15,0.20,0.25,0.30,0.35,0.40,0.45,0.50) \\ \alpha_{G_{A_{11-20}}} &= (0.50,0.45,0.40,0.35,0.30,0.25,0.20,0.15,0.10,0.05) \\ \alpha_{G_{A_{21-30}}} &= 0 \\ \alpha_{G_{B_{1-10}}} &= 0 \\ \alpha_{G_{B_{11-20}}} &= (0.05,0.10,0.15,0.20,0.25,0.30,0.35,0.40,0.45,0.50) \\ \alpha_{G_{B_{21-30}}} &= (0.05,0.10,0.15,0.20,0.25,0.30,0.35,0.40,0.45,0.50) \\ u &\sim \mathcal{N}(0,1) \textrm{ independently} \end{align*}\]

\[\begin{align*} b_0 &= \sum_{i=1}^{30}\beta_{G_{B_i}}*g_i + u\\ a_0 &= \sum_{i=1}^{30}\alpha_{G_{A_i}}*g_i + \alpha_{B_0}*b_0 + u \\ b_1 &= \beta_{b_0}*b_0 + \beta_{A_0}*a_0 + u \\ a_1 &= \alpha_{A_0}*a_0 + \alpha_{B_0}*b_0 + \alpha_{B_1}*b_1 + u \\ y_1 &= \gamma_{A_0}*a_0 + \gamma_{b_0}*b_0 + \gamma_{a_0*b_0}*a_0*b_0 + \gamma_{A_1}*a_1 + \gamma_{b_1}*b_1 + \gamma_{a_1*b_1}*a_1*b_1 + u \end{align*}\]

\[\begin{align*} a_0 &= \sum_{i=1}^{30}\alpha_{G_{A_i}}*g_i + u \\ b_0 &= \sum_{i=1}^{30}\beta_{G_{B_i}}*g_i + \beta_{A_0}*a_0 u\\ a_1 &= \alpha_{A_0}*a_0 + \alpha_{B_0}*b_0 + u \\ y_1 &= \gamma_{A_0}*a_0 + \gamma_{B_0}*b_0 + \gamma_{a_0*b_0}*a_0*b_0 + \gamma_{A_1}*a_1 + u \\ b_1 &= \beta_{B_0}*b_0 + \beta_{A_0}*a_0 + \beta_{A_1}*a_1 + \beta_Y*y + u \end{align*}\]

\[\begin{align*} a_0 &= \sum_{i=1}^{30}\alpha_{G_{A_i}}*g_i + u \\ b_0 &= \sum_{i=1}^{30}\beta_{G_{B_i}}*g_i + u\\ a_1 &= \sum_{i=1}^{30}\alpha_{G_{A_i}}*g_i + \alpha_{B_0}*b_0 + u \\ b_1 &= \beta_{B_0}*b_0 + \beta_{A_0}*a_0 + u \\ y_1 &= \gamma_{A_0}*a_0 + \gamma_{B_0}*b_0 + \gamma_{a_0*b_0}*a_0*b_0 + \gamma_{A_1}*a_1 + \gamma_{B_1}*b_1 + \gamma_{a_1*b_1}*a_1*b_1 + u \end{align*}\]

\[\begin{align*} a_0 &= \sum_{i=1}^{30}\alpha_{G_{A_i}}*g_i + u \\ b_0 &= \sum_{i=1}^{30}\beta_{G_{B_i}}*g_i + \beta_{A_0}*a_0 + u\\ a_1 &= \alpha_{A_0}*a_0 + \alpha_{B_0}*b_0 + u \\ b_1 &= \beta_{B_0}*b_0 + \beta_{A_0}*a_0 + \beta_{A_1}*a_1 + u \\ y_1 &= \gamma_{A_0}*a_0 + \gamma_{B_0}*b_0 + \gamma_{a_0*b_0}*a_0*b_0 + \gamma_{A_1}*a_1 + \gamma_{B_1}*b_1 + \gamma_{a_1*b_1}*a_1*b_1 + u \end{align*}\]

Appendix B - Full results

Network MR

All results were included in the text.

Multivariable MR

Factorial MR

Appendix C: Sanderson et al 2019 example (2)

We assume the causal strucure below:

where \(EA\) is educational attainment, \(CA_{\textrm{pre-EA}}\) is cognitive ability before educational attainment is reached, \(CA_{\textrm{post-EA}}\) is cognitive ability after educational attainment is reached, \(G_{EA}\) are the genetic variants related to educational attainment, \(G_{CA}\) are the genetic variants related to cognitive ability and \(BMI\) is body mass index.The goal was not to numerically reproduce the example from Sanderson et al, but to determine whether this causal structure would produce biased estimates of the total effects of educational attainment and/or cognitive ability.

We found that the total effect of cognitive ability was estimated without bias but that the total effect of educational attainment is biased to the degree that \(CA_{\textrm{pre-EA}}\) has a direct effect on \(BMI\). Therefore, if \(CA_{\textrm{pre-EA}}\) has no direct effect on \(BMI\), both parameters are estimated without bias.

The code to run this example can be found at: https://github.com/jalabrecque/MR_multiple_exposures

Appendix references

1. Burgess S, Daniel RM, Butterworth AS, et al. Network Mendelian randomization: Using genetic variants as instrumental variables to investigate mediation in causal pathways. International Journal of Epidemiology [electronic article]. 2015;44(2):484–495. (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4469795/pdf/dyu176.pdf)

2. Sanderson E, Davey Smith G, Windmeijer F, et al. An examination of multivariable Mendelian randomization in the single-sample and two-sample summary data settings. International Journal of Epidemiology. 2018;1–25.

Appendix: Mendelian randomization with multiple exposures: The importance of thinking about time