Main function to estimate marginal cumulative incidences and derived effect measures without matching
Source:R/nomatchVE.R
nomatchVE.Rd
nomatchVE()
estimates marginal cumulative incidences under
exposure and no exposure using a G-computation approach. The method fits two
conditional hazard models- one for unexposed and one for the exposed- and uses
these models to predict time- and covariate- specific cumulative incidences.
The predicted conditional risks are then marginalized to compute overall
(marginal) cumulative incidences. The resulting cumulative incidences can be
summarized as risk ratios (RR) or 1 - RR (vaccine effectiveness).
Usage
nomatchVE(
data,
outcome_time,
outcome_status,
exposure,
exposure_time,
covariates,
tau,
eval_times,
effect = c("vaccine_effectiveness", "risk_ratio"),
weights_source = c("observed", "custom"),
custom_weights = NULL,
ci_type = c("wald", "percentile", "both"),
boot_reps = 0,
alpha = 0.05,
keep_models = TRUE,
keep_boot_samples = TRUE,
n_cores = 1
)
Arguments
- data
A data frame with one row per individual containing the columns named in
outcome_time
,outcome_status
,exposure
,exposure_time
, and any variables listed incovariates
.- outcome_time
Name of the time-to-event/censoring variable. Time should be measured from a given time origin (e.g. study start, enrollment, or age) for all individuals.
- outcome_status
Name of the event indicator. The underlying column should be numeric (
1
= event,0
= censored).- exposure
Name of the exposure indicator. The underlying column should be numeric (
1
= exposed during follow-up,0
= never exposed during follow-up).- exposure_time
Name of the time to exposure, measured from the chosen time origin; use
NA
if not exposed. Time must be measured in the same units (e.g. days) as that used foroutcome_time
.- covariates
Character vector of covariates to adjust for when fitting the hazard models. These covariates should include all known confounders of exposure and censoring measured at the chosen time origin.
- tau
Non-negative numeric value specifying the time after exposure that should be excluded from the risk evaluation period. This argument is primarily intended for vaccination exposures, where it is common to exclude the time after vaccination when immunity is still building. Time must be measured in the same units as that used for
outcome_time
andexposure_time
and should reflect the biological understanding of when vaccine-induced immunity develops (usually 1-2 weeks). For non-vaccine exposures,tau
can be set to 0 (no delay period).- eval_times
Numeric vector specifying the timepoints at which to compute cumulative incidence and the derived effect measures. The timepoints should be expressed in terms of time since exposure. All values must be greater than
tau
and and should correspond to clinically meaningful follow-up durations, such as 30, 60, or 90 days after exposure. A fine grid of timepoints (e.g.,eval_times = (tau+1):100
) can be provided if cumulative incidence curves over time are desired.- effect
Character. Type of effect measure to compute and return, based on the estimated cumulative incidences. Either
"vaccine_effectiveness"
(default) or"risk_ratio"
.- weights_source
Character string specifying the type of marginalizing weights to use. Either:
"observed"
(default): set the marginalizing weights to the empirical distribution of exposure eval_times and covariates among the exposed. This provides close alignment with the weights implicitly used in matching."custom"
: use the user-specified weights provided in thecustom_weights
argument.
- custom_weights
a
list(g_weights, p_weights)
providing weights for marginalizing the time- and covariate-specific cumulative incidences. Must have the following format:g_weights
: data frame with columnsall variables in
covariates
exposure_time
(time of exposure),prob
(probability of exposure at the given time within the covariate-group; should sum to 1 within each covariate-group)
p_weights
: data frame with columnsall variables in
covariates
prob
(probability of covariate-group; should sum to 1 over all covariate groups.)
- ci_type
Method for constructing bootstrap confidence intervals. One of
"wald"
,"percentile"
, or"both"
."wald"
(default): Computes Wald-style intervals using bootstrap standard errors. See Confidence intervals section for details."percentile"
: Computes percentile bootstrap intervals."both"
: Computes and returns both sets of intervals.
- boot_reps
Number of bootstrap replicates for confidence intervals. Recommended to use at least 1000 for publication-quality results. Use smaller values (e.g., 10-100) for initial exploration. Default:
0
(no bootstrapping).- alpha
Significance level for confidence intervals (Confidence level = 100*(1-
alpha
)%). Default:0.05
.- keep_models
Logical; return the two fitted hazard models used to compute cumulative incidences? Default:
TRUE
.- keep_boot_samples
Logical; return bootstrap samples? Default:
TRUE
. Must be set toTRUE
if user plans to useadd_simultaneous_ci()
to obtain simultaneous confidence intervals.- n_cores
Integer; parallel cores for bootstrapping. Passed to
parallel::mclapply
asmc.cores
. On Unix-like OS only; not available on Windows. Default:1
.
Value
An object of class vefit
containing:
- estimates
List of matrices:
cuminc_0
marginal cumulative incidence under no exposure
cuminc_1
marginal cumulative incidence under exposure
<effect>
the selected effect measure under the same name
Each matrix has one row per value in
eval_times
and columns including the point estimate (estimate
) and, when requested, confidence limits of the form ({wald/percentile}_lower
,{wald/percentile}_upper
).- gp_list
List with dataframes
g_weights
,p_weights
specifying the marginalizing weights used for averaging over exposure eval_times and covariates.- model_0
Fitted hazard model for the unexposed group. See Modeling section for details.
- model_1
Fitted hazard model for the exposed group. See Modeling section for details.
- n_success_boot
Integer vector indicating the number of successful bootstrap replications per timepoint.
- boot_samples
(If
keep_boot_samples = TRUE
) List of bootstrap draws (stored as matrices) for each returned quantity with names mirroring those inestimates
(i.e.cuminc_0
,cuminc_1
,<effect
). Rows index bootstrap replicates and columns indexeval_times
.
The vefit
object has methods for print()
, summary()
, and plot()
.
Use add_simultaneous_ci()
to add simultaneous confidence intervals.
Details
Modeling. Two Cox proportional hazards models are fit to estimate
exposure-specific cumulative incidences. The first models the hazard of the
outcome over the chosen time scale among individuals who have not yet been
exposed, with exposed individuals censored at their time of exposure. The
second models the hazard over time since exposure among individuals who were
exposed and remain at risk tau
days after exposure. Both models adjust for
the specified covariates to help control for confounding. The second model
also flexibly adjusts for exposure time (as a natural cubic spline with 4
degrees of freedom) to capture time-varying background risk. Predicted risks
from both models are then marginalized over the specified covariate and
exposure-time distributions to obtain G-computation style cumulative
incidence estimates.
Marginalizing weights. When weights_source = "observed"
, the marginalizing weights
are the empirical distributions of exposure eval_times and covariates among the
exposed who remain at-risk tau
days after exposure. These weights are
returned in the vefit
object under gp_list
. They can also be obtained
prior to the call to nomatchVE()
by calling get_observed_weights()
.
Confidence intervals. Wald CIs are constructed on transformed scales: \(\text{logit}\) for cumulative incidence; \(\log{RR}\) for risk ratios, \(\log{1 - VE}\) for vaccine effectiveness, using bootstrap SEs. These are then back-transformed to the original scale.
Parallelization. Bootstraps can be parallelized on Unix via parallel::mclapply()
by providing n_cores
argument.
Examples
# Fit vaccine effectiveness model using simulated data
data(simdata)
fit <- nomatchVE(
data = simdata,
outcome_time = "Y",
outcome_status = "event",
exposure = "V",
exposure_time = "D_obs",
covariates = c("x1", "x2"),
eval_times = seq(30, 180, by = 30),
tau = 14,
boot_reps = 5,
n_cores = 2
)
#> Bootstrapping 5 samples...
#> Time difference of 1.371821 secs
# View basic results
fit$estimates
#> $cuminc_0
#> estimate wald_lower wald_upper wald_n
#> 30 0.01163891 0.01007963 0.01343613 5
#> 60 0.03790915 0.03331788 0.04310488 5
#> 90 0.05857293 0.05262835 0.06514281 5
#> 120 0.06691947 0.06032600 0.07417670 5
#> 150 0.07548793 0.06861566 0.08298717 5
#> 180 0.09576440 0.08852914 0.10352382 5
#>
#> $cuminc_1
#> estimate wald_lower wald_upper wald_n
#> 30 0.006221396 0.00490263 0.007892084 5
#> 60 0.022928272 0.02175395 0.024164415 5
#> 90 0.035306397 0.03194971 0.039001536 5
#> 120 0.044201057 0.04127583 0.047323360 5
#> 150 0.055121412 0.04940982 0.061450566 5
#> 180 0.099205626 0.09157224 0.107400100 5
#>
#> $vaccine_effectiveness
#> estimate wald_lower wald_upper wald_n
#> 30 0.46546592 0.3216548 0.57878865 5
#> 60 0.39517839 0.3201008 0.46196555 5
#> 90 0.39722334 0.2812300 0.49449794 5
#> 120 0.33948883 0.2407852 0.42536023 5
#> 150 0.26979833 0.1526535 0.37074799 5
#> 180 -0.03593427 -0.1511121 0.06771911 5
#>