Estimating and Testing Direct Effects in Directed Acyclic Graphs using Estimating Equations

Overview

In any association study, it is important to distinguish direct and indirect effects in order to build truly functional models. For this purpose, we consider a directed acyclic graph (DAG) setting with an exposure variable, primary and intermediate outcome variables, and confounding factors. In order to make valid statistical inference on the direct effect of the exposure on the primary outcome, it is necessary to consider all potential effects in the graph, and we propose to use the estimating equation method with robust Huber-White sandwich standard errors. Then, a large-sample Wald-type test statistic is computed for testing the absence of the direct effect. In this package, the proposed causal inference method based on estimating equations (CIEE) is implemented for both the analysis of continuous and time-to-event primary outcomes subject to censoring for the model in Figure 1. Additionally, standard multiple regression, regression of residuals, and the structural equation approach are implemented for fitting the same model.

Results from simulation studies (Konigorski et al., 2018) showed that CIEE successfully removes the effect of intermediate outcomes from the primary outcome and is robust against measured and unmeasured confounding of the indirect effect through observed factors. Also, an application in a genetic association study in the same study showed that CIEE can identify genetic variants that would be missed by traditional regression methods. Both multiple regression methods and the structural equation method fail in some scenarios where their corresponding test statistics lead to inflated type I errors. An alternative approach for the analysis of continuous traits is the sequential G-estimation method (Vansteelandt et al., 2009).

In this package, CIEE is implemented for the model described in the DAG in Figure 1, which includes the direct effect α_XY of an exposure X on the primary outcome Y and an indirect effect of X on Y through a secondary outcome K. The model further includes measured and unmeasured factors L and U, respectively, which potentially confound the effect of K on Y. CIEE can also be applied to different models with different error distributions. The goal is to estimate and test the direct effect α_XY, while removing the indirect effect of X on Y through K, and with robustness against effects of L and U. Without restriction of generality, it is assumed that there aren’t any factors affecting X and that any such factors are included as covariates in the analysis or have been dealt with using other approaches. Also, we generally assume that either α_LY = 0 (L is a factor influencing K) or α_XL = 0 (L is a measured confounder of K → Y). Otherwise, the effect of L as intermediate outcome could be removed from Y in the analysis analogously to K.

Overview of the underlying directed acyclic graph considered in this study. It is assumed that α_LY = 0 so that L is a measured predictive factor of K, however, CIEE is also valid if L is a measured confounder of K → Y (i.e., α_LY ≠ 0 and α_XL = 0).

Alternative approaches

Two traditional methods for the aim to estimate and test α_XY are (i) to include the intermediate outcomes and factors as covariates in a multiple regression (MR) model of the primary outcome on the exposure, or (ii) to first regress the primary outcome on the intermediate outcome and factors, and then regress the extracted residuals on the exposure (regression of residuals, RR). In more detail, estimates of α_XY are obtained from fitting the following models in the quantitative outcome setting for a normally-distributed Y (GLM setting):

MR: Obtain the least squares (LS) estimate of α_XY by fitting Y_i = α₀ + α_XYx_i + α₁k_i + α₂l_i + ε_i, ε_i ∼ N(0, σ₁²).

RR: First, obtain residuals ϵ̂_1i = y_i − α̂₀ − α̂₁k_i − α̂₂l_i by fitting
Y_i = α₀ + α₁k_i + α₂l_i + ε_1i, ε_1i ∼ N(0, σ₁²) using the LS estimation. Second, obtain the LS estimate of α_XY by fitting ε̂_1i = α₃ + α_XYx_i + ε_2i, ε_2i ∼ N(0, σ₂²).

Then, H₀ : α_XY = 0 versus H_A : α_XY ≠ 0 is tested using the default t-test in the lm() function in R. For the analysis of a censored time-to-event primary trait Y (accelerated failure time setting; AFT), only the MR approach is implemented. Here, the equivalent censored log-linear regression model is fitted using the survreg() function in the survival R package to obtain the maximum likelihood estimate of α_XY, and a Wald-type test is performed for testing the null hypothesis H₀ : α_XY = 0. Both approaches are implemented in the functions mult_reg() and res_reg() and can be used as follows. For this illustration, data is first generated using the generate_data() function, which generates data for the quantitative outcomes Y and K, a genetic marker X (single nucleotide polymorphism, SNP, taking values 0, 1, 2) as exposure, and observed as well as unobserved confounders L, U.

dat <- generate_data(setting="GLM", n = 1000, maf = 0.2, cens = 0.3, a = NULL, b = NULL, 
                     aUL = 0, aXL = 0, aXK = 0.2, aLK = 0, aUY = 0, aKY = 0.3, aXY = 0.1,
                     aLY = 0, mu_U = 0, sd_U = 1, X_orth_U = TRUE, mu_X = NULL, 
                     sd_X = NULL, mu_L = 0, sd_L = 1, mu_K = 0, sd_K = 1, mu_Y = 0, 
                     sd_Y = 1)
head(dat)

##             Y           K X          L          U
## 1 -0.06508841  1.00858003 0  1.1963916 -0.8802954
## 2  0.50045235  1.28821933 1 -0.2773230  0.7882714
## 3 -0.97260046 -0.05791208 0  1.1255301  0.2473285
## 4 -0.79033076  0.07182765 0  1.1814006 -0.1917915
## 5  0.07154002  0.00654755 1 -1.4213233 -1.1622779
## 6 -0.73735859 -0.68479544 0  0.1285172 -0.3324481

mult_reg(setting = "GLM", Y = dat$Y, X = dat$X, K = dat$K, L = dat$L)

## $point_estimates
##      alpha_0      alpha_1     alpha_XY      alpha_2 
##  0.023592870  0.330501723  0.098088457 -0.002362774 
## 
## $SE_estimates
##    alpha_0    alpha_1   alpha_XY    alpha_2 
## 0.03829659 0.03138484 0.05450608 0.02966551 
## 
## $pvalues
##      alpha_0      alpha_1     alpha_XY      alpha_2 
## 5.379979e-01 1.166395e-24 7.222837e-02 9.365339e-01

res_reg(Y = dat$Y, X = dat$X, K = dat$K, L = dat$L)

## $point_estimates
##      alpha_0      alpha_1      alpha_2      alpha_3     alpha_XY 
##  0.064130641  0.333310232 -0.003808244 -0.040673133  0.097771955 
## 
## $SE_estimates
##    alpha_0    alpha_1    alpha_2    alpha_3   alpha_XY 
## 0.03100573 0.03138119 0.02968791 0.03820967 0.05436380 
## 
## $pvalues
##      alpha_0      alpha_1      alpha_2      alpha_3     alpha_XY 
## 3.886470e-02 4.871954e-25 8.979565e-01 2.873723e-01 7.240381e-02

As another approach for modeling DAGs, the structural equation modeling method (SEM; Bollen, 1989) can be used. Among others, it is implemented in the sem() function of the lavaan package (Rosseel, 2012). For a comparison of the results, the function sem_appl() applies the SEM method to the DAG in Figure 1 based on the following model equations:

L_i = α₀ + α₁x_i + ε_1i, ε_1i ∼ N(0, σ₁²) K_i = α₂ + α₃x_i + α₂l_i + ε_2i, ε_2i ∼ N(0, σ₂²) Y_i = α₅ + α₆k_i + α_XYx_i + ε_3i, ε_3i ∼ N(0, σ₃²)

sem_appl(Y = dat$Y, X = dat$X, K = dat$K, L = dat$L)

## $point_estimates
##      alpha_1      alpha_3      alpha_4      alpha_6     alpha_XY 
## -0.050454318  0.086359088 -0.007274518  0.330520961  0.098206000 
## 
## $SE_estimates
##    alpha_1    alpha_3    alpha_4    alpha_6   alpha_XY 
## 0.05801018 0.05485136 0.02988953 0.03132118 0.05437719 
## 
## $pvalues
##      alpha_1      alpha_3      alpha_4      alpha_6     alpha_XY 
## 3.844373e-01 1.153904e-01 8.077109e-01 4.939260e-26 7.091605e-02

Further proposed approaches for fitting the model in Figure 1 include the two-stage sequential G-estimation method (Vansteelandt et al., 2009). It first removes the effect of K from the primary outcome Y, and then tests the association of X with the adjusted primary outcome.

In more detail, for the analyis of a quantitative outcome Y with n independent observations, in the first stage, the effect of K on Y, α₁, is estimated and α̂₁ is obtained using the LS estimation method under the model

Then, to block all indirect paths from X to Y, the adjusted outcome Ỹ is obtained by removing the effect of K on Y with

where $\bar{y} = \sum_{i=1}^n y_i$ and $\bar{k} = \sum_{i=1}^n k_i$. In the second stage, the significance of the direct effect of X on Y, α_XY, is tested under the model

using the proposed test statistic in Vansteelandt and colleagues (2009). The approach is implemented in the CGene package, which can be obtained from cran.r-project.org/src/contrib/Archive/CGene/.

Causal inference using estimating equations (CIEE)

CIEE follows the general idea of the two-stage sequential G-estimation method with the major difference that the approach is one-stage and obtains coefficient estimates of all parameters simultaneously by solving estimating equations. This also allows building on existing asymptotic properties of the estimator and obtaining robust sandwich standard error estimates considering the additional variability of the estimates from the outcome adjustment.

In more detail, for the analyis of a quantitative outcome Y, we formulate unbiased estimating equations U(θ) = 0 for a consistent estimation of the unknown parameter vector θ = (α₀, α₁, α₂, α₃, σ₁², α₄, α_XY, σ₂²)^T where

$$ U(\theta) = \left( \frac{\partial l_1(\theta)}{\partial \alpha_0}, \frac{\partial l_1(\theta)}{\partial \alpha_1}, \frac{\partial l_1(\theta)}{\partial \alpha_2}, \frac{\partial l_1(\theta)}{\partial \alpha_3}, \frac{\partial l_1(\theta)}{\partial \sigma_1^2}, \frac{\partial l_1(\theta)}{\partial \alpha_4}, \frac{\partial l_1(\theta)}{\partial \alpha_{XY}}, \frac{\partial l_1(\theta)}{\partial \sigma_2^2} \right)^T $$ with

$$ l_1(\theta) = \sum_{i=1}^n \left[ -log(\sigma_1) + log\left( \phi\left( \frac{y_i - \alpha_0 - \alpha_1 k_i - \alpha_2 x_i - \alpha_3 l_i}{\sigma_1} \right) \right) \right]$$

and

$$ l_2(\theta) = \sum_{i=1}^n \left[ -log(\sigma_2) + log\left( \phi\left( \frac{y_i - \bar{y} - \alpha_1 (k_i - \bar{k}) - \alpha_4 - \alpha_{XY} x_i}{\sigma_2} \right) \right) \right],$$

where ϕ is the probability density function of the standard normal distribution.

By solving the first five estimating equations based on l₁(θ), we are hence obtaining estimates of α₀, α₁, α₂, α₃, σ₁². Analogously, solving the last three estimating equations based on l₂(θ) yields estimates of α₄, α_XY, σ₂². To give an intuition on how these estimating equations are obtained, l₁(θ) is the log-likelihood function under the model in (1) and l₂(θ) is the log-likelihood function under the model in (3) given that α₁ is known. All parameters in θ are estimated simultaneously and the additional variability obtained in the outcome adjustment in (2) is considered by using the robust Huber-White sandwich estimator of the standard error of θ̂. Then, the large sample Wald-type test statistic $\hat{\theta}/\widehat{SE}(\hat{\theta})$, which has an asymptotic standard normal distribution, is computed to test the absence of the exposure effect α_XY.

For the analysis of a censored time-to-event primary outcome Y, the estimating equations can be constructed as described above for a quantitative primary phenotype (for the same parameters except for σ₁ instead of σ₁²), but in order to remove the effect of K from Y, the true underlying log survival times Y_est need to be estimated for censored survival times. For uncensored survival times, Y_est equals the observed log-survival time Y. Then, the estimating equations are constructed accordingly, the robust Huber-White sandwich estimator of the standard error is obtained and the large-sample Wald-type tests are computed. For more statistical details, see Konigorski et al. (2018).

In the implementation of CIEE in this package, the est_funct_expr() function contains the estimating equations as an expression.

estfunct <- est_funct_expr(setting="GLM")
estfunct

## $logL1
## expression(log((1/sqrt(sigma1sq)) * dnorm((y_i - alpha0 - alpha1 * 
##     k_i - alpha2 * x_i - alpha3 * l_i)/sqrt(sigma1sq), mean = 0, 
##     sd = 1)))
## 
## $logL2
## expression(log((1/sqrt(sigma2sq)) * dnorm((y_i - y_bar - alpha1 * 
##     (k_i - k_bar) - alpha4 - alphaXY * x_i)/sqrt(sigma2sq), mean = 0, 
##     sd = 1)))

est_funct_expr(setting = "AFT")

## $logL1
## expression(-c_i * log(sigma1) + c_i * log(dnorm((y_i - alpha0 - 
##     alpha1 * k_i - alpha2 * x_i - alpha3 * l_i)/sigma1, mean = 0, 
##     sd = 1)) + (1 - c_i) * log(1 - pnorm((y_i - alpha0 - alpha1 * 
##     k_i - alpha2 * x_i - alpha3 * l_i)/sigma1, mean = 0, sd = 1)))
## 
## $logL2
## expression(log((1/sqrt(sigma2sq)) * dnorm(((c_i * y_i + (1 - 
##     c_i) * ((alpha0 + alpha1 * k_i + alpha2 * x_i + alpha3 * 
##     l_i) + (sigma1 * dnorm((y_i - alpha0 - alpha1 * k_i - alpha2 * 
##     x_i - alpha3 * l_i)/sigma1, mean = 0, sd = 1)/(1 - pnorm((y_i - 
##     alpha0 - alpha1 * k_i - alpha2 * x_i - alpha3 * l_i)/sigma1, 
##     mean = 0, sd = 1))))) - y_adj_bar - alpha1 * (k_i - k_bar) - 
##     alpha4 - alphaXY * x_i)/sqrt(sigma2sq), mean = 0, sd = 1)))

The function get_estimates() obtains estimates of the parameters in the models (1)-(3) by using the lm() and survreg() functions for computational purposes. The estimates are identical to estimates obtained by solving the estimating equations.

estimates <- get_estimates(setting = "GLM", Y = dat$Y, X = dat$X, K = dat$K, L = dat$L)
estimates

##      alpha_0      alpha_1      alpha_2      alpha_3   sigma_1_sq      alpha_4 
##  0.023592870  0.330501723  0.098088457 -0.002362774  0.946618319 -0.040854390 
##     alpha_XY   sigma_2_sq 
##  0.098207669  0.946624349

In order to compute the robust Huber-White sandwich estimator of the parameters, in a first step, the deriv_obj() function computes the expression of all first and second derivatives for the 8 parameters α₀, α₁, α₂, α₃, σ₁², α₄, α_XY, σ₂² by using the expressions from the est_funct_expr() function as input. Then, the numerical values of all first and second derivatives are obtained for the observed data and parameter point estimates for all observed individuals, using the scores() and hessian() functions.

derivobj <- deriv_obj(setting = "GLM", logL1 = estfunct$logL1, logL2 = estfunct$logL2, 
                      Y = dat$Y, X = dat$X, K = dat$K, L = dat$L, estimates = estimates)
names(derivobj)

## [1] "logL1_deriv" "logL2_deriv"

head(derivobj$logL1_deriv$gradient)

##           alpha0        alpha1      alpha2      alpha3     sigma1sq alpha4
## [1,] -0.44283097 -0.4466304736  0.00000000 -0.52979926 -0.430146355      0
## [2,] -0.05032961 -0.0648355792 -0.05032961  0.01395756 -0.526929455      0
## [3,] -1.02934192  0.0596113277  0.00000000 -1.15855527  0.001576406      0
## [4,] -0.88195146 -0.0633485029  0.00000000 -1.04193796 -0.139276799      0
## [5,] -0.05880252 -0.0003850125 -0.05880252  0.08357740 -0.526467121      0
## [6,] -0.56445319  0.3865349709  0.00000000 -0.07254196 -0.368892289      0
##      alphaXY sigma2sq
## [1,]       0        0
## [2,]       0        0
## [3,]       0        0
## [4,]       0        0
## [5,]       0        0
## [6,]       0        0

score_matrix <- scores(derivobj)
head(score_matrix)

##           alpha0        alpha1      alpha2      alpha3     sigma1sq      alpha4
## [1,] -0.44283097 -0.4466304736  0.00000000 -0.52979926 -0.430146355 -0.44576456
## [2,] -0.05032961 -0.0648355792 -0.05032961  0.01395756 -0.526929455 -0.04971324
## [3,] -1.02934192  0.0596113277  0.00000000 -1.15855527  0.001576406 -1.03209490
## [4,] -0.88195146 -0.0633485029  0.00000000 -1.04193796 -0.139276799 -0.88484483
## [5,] -0.05880252 -0.0003850125 -0.05880252  0.08357740 -0.526467121 -0.05533067
## [6,] -0.56445319  0.3865349709  0.00000000 -0.07254196 -0.368892289 -0.56472058
##          alphaXY     sigma2sq
## [1,]  0.00000000 -0.428839606
## [2,] -0.04971324 -0.526956922
## [3,]  0.00000000  0.004417316
## [4,]  0.00000000 -0.136717436
## [5,] -0.05533067 -0.526661884
## [6,]  0.00000000 -0.368737956

hessian_matrix <- hessian(derivobj)
str(hessian_matrix)

##  num [1:1000, 1:8, 1:8] -1.06 -1.06 -1.06 -1.06 -1.06 ...
##  - attr(*, "dimnames")=List of 3
##   ..$ : NULL
##   ..$ : chr [1:8] "alpha0" "alpha1" "alpha2" "alpha3" ...
##   ..$ : chr [1:8] "alpha0" "alpha1" "alpha2" "alpha3" ...

The robust Huber-White sandwich estimator of the standard error can then be obtained using the sandwich_se() function:

sandwich_se(scores = score_matrix, hessian = hessian_matrix)

##    alpha_0    alpha_1    alpha_2    alpha_3 sigma_1_sq    alpha_4   alpha_XY 
## 0.03855204 0.03083597 0.05351610 0.02981008 0.04453341 0.03853616 0.05352831 
## sigma_2_sq 
## 0.04454114

Alternatively, bootstrap standard error estimates can be computed using the bootstrap_se() function. Also, for comparison, the function naive_se() computes naive standard error estimates of the parameter estimates of α₀, α₁, α₂, α₃, α₄, α_XY without accounting for the additional variability due to the two stages in the model in (1)-(3):

bootstrap_se(setting = "GLM", BS_rep = 1000, Y = dat$Y, X = dat$X, K = dat$K, L = dat$L)

##    alpha_0    alpha_1    alpha_2    alpha_3 sigma_1_sq    alpha_4   alpha_XY 
## 0.03910938 0.03048720 0.05247363 0.02972857 0.04399445 0.02190997 0.05254197 
## sigma_2_sq 
## 0.04406866

naive_se(setting = "GLM", Y = dat$Y, X = dat$X, K = dat$K, L = dat$L)

##    alpha_0    alpha_1    alpha_2    alpha_3 sigma_1_sq    alpha_4   alpha_XY 
## 0.03829659 0.03138484 0.05450608 0.02966551         NA 0.03820959 0.05436369 
## sigma_2_sq 
##         NA

Finally, the functions ciee() and ciee_loop() allow an easy integrated use of all above functions and a simultaneous computation of the estimating equations approach using either standard error computation, the traditional regression-based approaches, and the SEM method. ciee() fits the model in equations (1)-(3) (e.g. the model in Figure 1) and yields parameter estimates, standard error estimates, and p-values for all parameters. ciee_loop() provides an extension of ciee() and allows the input of multiple exposure variables (e.g. multiple SNPs) which are tested sequentially. In the output of ciee_loop(), only the coefficient estimates, standard error estimates, and p-values with respect to the direct effect α_XY are provided.

results_ciee <- ciee(setting = "GLM", Y = dat$Y, X = dat$X, K = dat$K, L = dat$L,
                     estimates = c("ee", "mult_reg", "res_reg", "sem"),
                     ee_se = "sandwich")
results_ciee

## $results_ee
## $results_ee$point_estimates
##      alpha_0      alpha_1      alpha_2      alpha_3   sigma_1_sq      alpha_4 
##  0.023592870  0.330501723  0.098088457 -0.002362774  0.946618319 -0.040854390 
##     alpha_XY   sigma_2_sq 
##  0.098207669  0.946624349 
## 
## $results_ee$SE_estimates
##    alpha_0    alpha_1    alpha_2    alpha_3 sigma_1_sq    alpha_4   alpha_XY 
## 0.03855204 0.03083597 0.05351610 0.02981008 0.04453341 0.03853616 0.05352831 
## sigma_2_sq 
## 0.04454114 
## 
## $results_ee$wald_test_stat
##     alpha_0     alpha_1     alpha_2     alpha_3  sigma_1_sq     alpha_4 
##  0.61197465 10.71805736  1.83287752 -0.07926091 21.25636269 -1.06015717 
##    alpha_XY  sigma_2_sq 
##  1.83468667 21.25280787 
## 
## $results_ee$pvalues
##       alpha_0       alpha_1       alpha_2       alpha_3    sigma_1_sq 
##  5.405545e-01  8.374373e-27  6.682079e-02  9.368251e-01 2.878828e-100 
##       alpha_4      alpha_XY    sigma_2_sq 
##  2.890731e-01  6.655213e-02 3.105286e-100 
## 
## 
## $results_mult_reg
## $results_mult_reg$point_estimates
##      alpha_0      alpha_1     alpha_XY      alpha_2 
##  0.023592870  0.330501723  0.098088457 -0.002362774 
## 
## $results_mult_reg$SE_estimates
##    alpha_0    alpha_1   alpha_XY    alpha_2 
## 0.03829659 0.03138484 0.05450608 0.02966551 
## 
## $results_mult_reg$pvalues
##      alpha_0      alpha_1     alpha_XY      alpha_2 
## 5.379979e-01 1.166395e-24 7.222837e-02 9.365339e-01 
## 
## 
## $results_res_reg
## $results_res_reg$point_estimates
##      alpha_0      alpha_1      alpha_2      alpha_3     alpha_XY 
##  0.064130641  0.333310232 -0.003808244 -0.040673133  0.097771955 
## 
## $results_res_reg$SE_estimates
##    alpha_0    alpha_1    alpha_2    alpha_3   alpha_XY 
## 0.03100573 0.03138119 0.02968791 0.03820967 0.05436380 
## 
## $results_res_reg$pvalues
##      alpha_0      alpha_1      alpha_2      alpha_3     alpha_XY 
## 3.886470e-02 4.871954e-25 8.979565e-01 2.873723e-01 7.240381e-02 
## 
## 
## $results_sem
## $results_sem$point_estimates
##      alpha_1      alpha_3      alpha_4      alpha_6     alpha_XY 
## -0.050454318  0.086359088 -0.007274518  0.330520961  0.098206000 
## 
## $results_sem$SE_estimates
##    alpha_1    alpha_3    alpha_4    alpha_6   alpha_XY 
## 0.05801018 0.05485136 0.02988953 0.03132118 0.05437719 
## 
## $results_sem$pvalues
##      alpha_1      alpha_3      alpha_4      alpha_6     alpha_XY 
## 3.844373e-01 1.153904e-01 8.077109e-01 4.939260e-26 7.091605e-02 
## 
## 
## attr(,"class")
## [1] "ciee"

maf <- 0.2
n <- 1000
dat <- generate_data(n = n, maf = maf)
datX <- data.frame(X = dat$X)
names(datX)[1] <- "X1"
for(i in 2:10){
 X <- rbinom(n, size = 2, prob = maf)
 datX$X <- X
 names(datX)[i] <- paste("X", i, sep="")
}
results_ciee_loop <- ciee_loop(setting = "GLM", Y = dat$Y, X = datX, K = dat$K, L = dat$L)
results_ciee_loop

## $results_ee
## $results_ee$point_estimates
##           X1           X2           X3           X4           X5           X6 
##  0.181213142 -0.024869721  0.062522394  0.041141636  0.003915172  0.002290242 
##           X7           X8           X9          X10 
## -0.070948971  0.050690067  0.007584904 -0.034500169 
## 
## $results_ee$SE_estimates
##         X1         X2         X3         X4         X5         X6         X7 
## 0.05700344 0.05402628 0.05426550 0.06005578 0.05346202 0.05289612 0.05517172 
##         X8         X9        X10 
## 0.05768757 0.05407598 0.05408914 
## 
## $results_ee$wald_test_stat
##          X1          X2          X3          X4          X5          X6 
##  3.17898591 -0.46032632  1.15215741  0.68505706  0.07323276  0.04329697 
##          X7          X8          X9         X10 
## -1.28596639  0.87870006  0.14026382 -0.63783910 
## 
## $results_ee$pvalues
##          X1          X2          X3          X4          X5          X6 
## 0.001477913 0.645282010 0.249256396 0.493307916 0.941620896 0.965464805 
##          X7          X8          X9         X10 
## 0.198454792 0.379563928 0.888451547 0.523578421 
## 
## 
## $results_mult_reg
## $results_mult_reg$point_estimates
##           X1           X2           X3           X4           X5           X6 
##  0.181238579 -0.025218156  0.062409144  0.041189952  0.003889304  0.002328911 
##           X7           X8           X9          X10 
## -0.070885874  0.050848090  0.008036246 -0.034403570 
## 
## $results_mult_reg$SE_estimates
##         X1         X2         X3         X4         X5         X6         X7 
## 0.05749037 0.05547256 0.05513860 0.05930948 0.05424050 0.05497075 0.05683923 
##         X8         X9        X10 
## 0.05741851 0.05707235 0.05636120 
## 
## $results_mult_reg$pvalues
##          X1          X2          X3          X4          X5          X6 
## 0.001667166 0.649491807 0.257965958 0.487535766 0.942851230 0.966215125 
##          X7          X8          X9         X10 
## 0.212643252 0.376063185 0.888050070 0.541727950 
## 
## 
## $results_res_reg
## $results_res_reg$point_estimates
##           X1           X2           X3           X4           X5           X6 
##  0.178392733 -0.025145873  0.062300933  0.040831124  0.003887054  0.002328129 
##           X7           X8           X9          X10 
## -0.070552492  0.050792654  0.007985482 -0.034360331 
## 
## $results_res_reg$SE_estimates
##         X1         X2         X3         X4         X5         X6         X7 
## 0.05698451 0.05533749 0.05503561 0.05899150 0.05417045 0.05490642 0.05664878 
##         X8         X9        X10 
## 0.05732969 0.05683478 0.05626931 
## 
## $results_res_reg$pvalues
##          X1          X2          X3          X4          X5          X6 
## 0.001795528 0.649633015 0.257901578 0.489002488 0.942810472 0.966186902 
##          X7          X8          X9         X10 
## 0.213263412 0.375844610 0.888290586 0.541576536 
## 
## 
## $results_sem
## $results_sem$point_estimates
##           X1           X2           X3           X4           X5           X6 
##  0.181169060 -0.024866926  0.062519975  0.041113759  0.003909310  0.002294553 
##           X7           X8           X9          X10 
## -0.070947748  0.050697624  0.007577499 -0.034506385 
## 
## $results_sem$SE_estimates
##         X1         X2         X3         X4         X5         X6         X7 
## 0.05737312 0.05528461 0.05498902 0.05918660 0.05413176 0.05485979 0.05661273 
##         X8         X9        X10 
## 0.05728932 0.05679548 0.05623479 
## 
## $results_sem$pvalues
##          X1          X2          X3          X4          X5          X6 
## 0.001590005 0.652855856 0.255557629 0.487276908 0.942428083 0.966637592 
##          X7          X8          X9         X10 
## 0.210128563 0.376188829 0.893863381 0.539471251 
## 
## 
## attr(,"class")
## [1] "ciee"

Both ciee() and ciee_loop() return ciee objects as output, and the implemented summary.ciee() function can be used through the generic summary() to provide a reader-friendly formatted output of the results.

summary(results_ciee)

## [1] "Results based on estimating equations."
##                 point_estimates SE_estimates wald_test_stat       pvalues
## CIEE_alpha_0        0.023592870   0.03855204     0.61197465  5.405545e-01
## CIEE_alpha_1        0.330501723   0.03083597    10.71805736  8.374373e-27
## CIEE_alpha_2        0.098088457   0.05351610     1.83287752  6.682079e-02
## CIEE_alpha_3       -0.002362774   0.02981008    -0.07926091  9.368251e-01
## CIEE_sigma_1_sq     0.946618319   0.04453341    21.25636269 2.878828e-100
## CIEE_alpha_4       -0.040854390   0.03853616    -1.06015717  2.890731e-01
## CIEE_alpha_XY       0.098207669   0.05352831     1.83468667  6.655213e-02
## CIEE_sigma_2_sq     0.946624349   0.04454114    21.25280787 3.105286e-100
## [1] "Results based on traditional multiple regression."
##             point_estimates SE_estimates      pvalues
## MR_alpha_0      0.023592870   0.03829659 5.379979e-01
## MR_alpha_1      0.330501723   0.03138484 1.166395e-24
## MR_alpha_XY     0.098088457   0.05450608 7.222837e-02
## MR_alpha_2     -0.002362774   0.02966551 9.365339e-01
## [1] "Results based on traditional regression of residuals."
##             point_estimates SE_estimates      pvalues
## RR_alpha_0      0.064130641   0.03100573 3.886470e-02
## RR_alpha_1      0.333310232   0.03138119 4.871954e-25
## RR_alpha_2     -0.003808244   0.02968791 8.979565e-01
## RR_alpha_3     -0.040673133   0.03820967 2.873723e-01
## RR_alpha_XY     0.097771955   0.05436380 7.240381e-02
## [1] "Results based on structural equation modeling."
##              point_estimates SE_estimates      pvalues
## SEM_alpha_1     -0.050454318   0.05801018 3.844373e-01
## SEM_alpha_3      0.086359088   0.05485136 1.153904e-01
## SEM_alpha_4     -0.007274518   0.02988953 8.077109e-01
## SEM_alpha_6      0.330520961   0.03132118 4.939260e-26
## SEM_alpha_XY     0.098206000   0.05437719 7.091605e-02

summary(results_ciee_loop)

## [1] "Results based on estimating equations."
##          point_estimates SE_estimates wald_test_stat     pvalues
## CIEE_X1      0.181213142   0.05700344     3.17898591 0.001477913
## CIEE_X2     -0.024869721   0.05402628    -0.46032632 0.645282010
## CIEE_X3      0.062522394   0.05426550     1.15215741 0.249256396
## CIEE_X4      0.041141636   0.06005578     0.68505706 0.493307916
## CIEE_X5      0.003915172   0.05346202     0.07323276 0.941620896
## CIEE_X6      0.002290242   0.05289612     0.04329697 0.965464805
## CIEE_X7     -0.070948971   0.05517172    -1.28596639 0.198454792
## CIEE_X8      0.050690067   0.05768757     0.87870006 0.379563928
## CIEE_X9      0.007584904   0.05407598     0.14026382 0.888451547
## CIEE_X10    -0.034500169   0.05408914    -0.63783910 0.523578421
## [1] "Results based on traditional multiple regression."
##        point_estimates SE_estimates     pvalues
## MR_X1      0.181238579   0.05749037 0.001667166
## MR_X2     -0.025218156   0.05547256 0.649491807
## MR_X3      0.062409144   0.05513860 0.257965958
## MR_X4      0.041189952   0.05930948 0.487535766
## MR_X5      0.003889304   0.05424050 0.942851230
## MR_X6      0.002328911   0.05497075 0.966215125
## MR_X7     -0.070885874   0.05683923 0.212643252
## MR_X8      0.050848090   0.05741851 0.376063185
## MR_X9      0.008036246   0.05707235 0.888050070
## MR_X10    -0.034403570   0.05636120 0.541727950
## [1] "Results based on traditional regression of residuals."
##        point_estimates SE_estimates     pvalues
## RR_X1      0.178392733   0.05698451 0.001795528
## RR_X2     -0.025145873   0.05533749 0.649633015
## RR_X3      0.062300933   0.05503561 0.257901578
## RR_X4      0.040831124   0.05899150 0.489002488
## RR_X5      0.003887054   0.05417045 0.942810472
## RR_X6      0.002328129   0.05490642 0.966186902
## RR_X7     -0.070552492   0.05664878 0.213263412
## RR_X8      0.050792654   0.05732969 0.375844610
## RR_X9      0.007985482   0.05683478 0.888290586
## RR_X10    -0.034360331   0.05626931 0.541576536
## [1] "Results based on structural equation modeling."
##         point_estimates SE_estimates     pvalues
## SEM_X1      0.181169060   0.05737312 0.001590005
## SEM_X2     -0.024866926   0.05528461 0.652855856
## SEM_X3      0.062519975   0.05498902 0.255557629
## SEM_X4      0.041113759   0.05918660 0.487276908
## SEM_X5      0.003909310   0.05413176 0.942428083
## SEM_X6      0.002294553   0.05485979 0.966637592
## SEM_X7     -0.070947748   0.05661273 0.210128563
## SEM_X8      0.050697624   0.05728932 0.376188829
## SEM_X9      0.007577499   0.05679548 0.893863381
## SEM_X10    -0.034506385   0.05623479 0.539471251

References

Bollen KA (1989). Structural equations with latent variables. New York: John Wiley & Sons.

Konigorski S, Wang Y, Cigsar C, Yilmaz YE (2018). Estimating and testing direct genetic effects in directed acyclic graphs using estimating equations. Genetic Epidemiology, 42: 174-186.

Rosseel Y (2012). lavaan: an R package for structural equation modeling. Journal of Statistical Software, 48(2), 1–36.

Vansteelandt S, Goetgeluk S, Lutz S, et al. (2009). On the adjustment for covariates in genetic association analysis: a novel, simple principle to infer direct causal effects. Genetic Epidemiology, 33, 394-405.