Main function to estimate marginal cumulative incidences and derived effect measures without matching

nomatchVE() estimates marginal cumulative incidences under exposure and no exposure using a G-computation approach. The method fits two conditional hazard models- one for unexposed and one for the exposed- and uses these models to predict time- and covariate- specific cumulative incidences. The predicted conditional risks are then marginalized to compute overall (marginal) cumulative incidences. The resulting cumulative incidences can be summarized as risk ratios (RR) or 1 - RR (vaccine effectiveness).

Usage

nomatchVE(
  data,
  outcome_time,
  outcome_status,
  exposure,
  exposure_time,
  covariates,
  tau,
  eval_times,
  effect = c("vaccine_effectiveness", "risk_ratio"),
  weights_source = c("observed", "custom"),
  custom_weights = NULL,
  ci_type = c("wald", "percentile", "both"),
  boot_reps = 0,
  alpha = 0.05,
  keep_models = TRUE,
  keep_boot_samples = TRUE,
  n_cores = 1
)

Arguments

data

A data frame with one row per individual containing the columns named in outcome_time, outcome_status, exposure, exposure_time, and any variables listed in covariates.

outcome_time

Name of the time-to-event/censoring variable. Time should be measured from a given time origin (e.g. study start, enrollment, or age) for all individuals.

outcome_status

Name of the event indicator. The underlying column should be numeric (1 = event, 0 = censored).

exposure

Name of the exposure indicator. The underlying column should be numeric (1 = exposed during follow-up, 0 = never exposed during follow-up).

exposure_time

Name of the time to exposure, measured from the chosen time origin; use NA if not exposed. Time must be measured in the same units (e.g. days) as that used for outcome_time.

covariates

Character vector of covariates to adjust for when fitting the hazard models. These covariates should include all known confounders of exposure and censoring measured at the chosen time origin.

tau

Non-negative numeric value specifying the time after exposure that should be excluded from the risk evaluation period. This argument is primarily intended for vaccination exposures, where it is common to exclude the time after vaccination when immunity is still building. Time must be measured in the same units as that used for outcome_time and exposure_time and should reflect the biological understanding of when vaccine-induced immunity develops (usually 1-2 weeks). For non-vaccine exposures, tau can be set to 0 (no delay period).

eval_times

Numeric vector specifying the timepoints at which to compute cumulative incidence and the derived effect measures. The timepoints should be expressed in terms of time since exposure. All values must be greater than tau and and should correspond to clinically meaningful follow-up durations, such as 30, 60, or 90 days after exposure. A fine grid of timepoints (e.g., eval_times = (tau+1):100) can be provided if cumulative incidence curves over time are desired.

effect

Character. Type of effect measure to compute and return, based on the estimated cumulative incidences. Either "vaccine_effectiveness" (default) or "risk_ratio".

weights_source

Character string specifying the type of marginalizing weights to use. Either:

"observed" (default): set the marginalizing weights to the empirical distribution of exposure eval_times and covariates among the exposed. This provides close alignment with the weights implicitly used in matching.
"custom": use the user-specified weights provided in the custom_weights argument.

custom_weights

a list(g_weights, p_weights) providing weights for marginalizing the time- and covariate-specific cumulative incidences. Must have the following format:

g_weights: data frame with columns
- all variables in covariates
- exposure_time (time of exposure),
- prob (probability of exposure at the given time within the covariate-group; should sum to 1 within each covariate-group)
p_weights: data frame with columns
- all variables in covariates
- prob (probability of covariate-group; should sum to 1 over all covariate groups.)

ci_type

Method for constructing bootstrap confidence intervals. One of "wald", "percentile", or "both".

"wald" (default): Computes Wald-style intervals using bootstrap standard errors. See Confidence intervals section for details.
"percentile": Computes percentile bootstrap intervals.
"both": Computes and returns both sets of intervals.

boot_reps

Number of bootstrap replicates for confidence intervals. Recommended to use at least 1000 for publication-quality results. Use smaller values (e.g., 10-100) for initial exploration. Default: 0 (no bootstrapping).

alpha

Significance level for confidence intervals (Confidence level = 100*(1-alpha)%). Default: 0.05.

keep_models

Logical; return the two fitted hazard models used to compute cumulative incidences? Default: TRUE.

keep_boot_samples

Logical; return bootstrap samples? Default: TRUE. Must be set to TRUE if user plans to use add_simultaneous_ci() to obtain simultaneous confidence intervals.

n_cores

Integer; parallel cores for bootstrapping. Passed to parallel::mclapply as mc.cores. On Unix-like OS only; not available on Windows. Default: 1.

Value

An object of class vefit containing:

estimates

List of matrices:

cuminc_0: marginal cumulative incidence under no exposure
cuminc_1: marginal cumulative incidence under exposure
<effect>: the selected effect measure under the same name

Each matrix has one row per value in eval_times and columns including the point estimate (estimate) and, when requested, confidence limits of the form ({wald/percentile}_lower, {wald/percentile}_upper).

gp_list

List with dataframes g_weights, p_weights specifying the marginalizing weights used for averaging over exposure eval_times and covariates.

model_0

Fitted hazard model for the unexposed group. See Modeling section for details.

model_1

Fitted hazard model for the exposed group. See Modeling section for details.

n_success_boot

Integer vector indicating the number of successful bootstrap replications per timepoint.

boot_samples

(If keep_boot_samples = TRUE) List of bootstrap draws (stored as matrices) for each returned quantity with names mirroring those in estimates (i.e. cuminc_0, cuminc_1, <effect). Rows index bootstrap replicates and columns index eval_times.

The vefit object has methods for print(), summary(), and plot(). Use add_simultaneous_ci() to add simultaneous confidence intervals.

Details

Modeling. Two Cox proportional hazards models are fit to estimate exposure-specific cumulative incidences. The first models the hazard of the outcome over the chosen time scale among individuals who have not yet been exposed, with exposed individuals censored at their time of exposure. The second models the hazard over time since exposure among individuals who were exposed and remain at risk tau days after exposure. Both models adjust for the specified covariates to help control for confounding. The second model also flexibly adjusts for exposure time (as a natural cubic spline with 4 degrees of freedom) to capture time-varying background risk. Predicted risks from both models are then marginalized over the specified covariate and exposure-time distributions to obtain G-computation style cumulative incidence estimates.

Marginalizing weights. When weights_source = "observed", the marginalizing weights are the empirical distributions of exposure eval_times and covariates among the exposed who remain at-risk tau days after exposure. These weights are returned in the vefit object under gp_list. They can also be obtained prior to the call to nomatchVE() by calling get_observed_weights().

Confidence intervals. Wald CIs are constructed on transformed scales: \(\text{logit}\) for cumulative incidence; \(\log{RR}\) for risk ratios, \(\log{1 - VE}\) for vaccine effectiveness, using bootstrap SEs. These are then back-transformed to the original scale.

Parallelization. Bootstraps can be parallelized on Unix via parallel::mclapply() by providing n_cores argument.

Examples

# Fit vaccine effectiveness model using simulated data
data(simdata)

fit <- nomatchVE(
  data = simdata,
  outcome_time = "Y",
  outcome_status = "event",
  exposure = "V",
  exposure_time = "D_obs",
  covariates = c("x1", "x2"),
  eval_times = seq(30, 180, by = 30),
  tau = 14,
  boot_reps = 5,
  n_cores = 2
)
#> Bootstrapping 5 samples...
#> Time difference of 1.371821 secs

# View basic results
fit$estimates
#> $cuminc_0
#>       estimate wald_lower wald_upper wald_n
#> 30  0.01163891 0.01007963 0.01343613      5
#> 60  0.03790915 0.03331788 0.04310488      5
#> 90  0.05857293 0.05262835 0.06514281      5
#> 120 0.06691947 0.06032600 0.07417670      5
#> 150 0.07548793 0.06861566 0.08298717      5
#> 180 0.09576440 0.08852914 0.10352382      5
#> 
#> $cuminc_1
#>        estimate wald_lower  wald_upper wald_n
#> 30  0.006221396 0.00490263 0.007892084      5
#> 60  0.022928272 0.02175395 0.024164415      5
#> 90  0.035306397 0.03194971 0.039001536      5
#> 120 0.044201057 0.04127583 0.047323360      5
#> 150 0.055121412 0.04940982 0.061450566      5
#> 180 0.099205626 0.09157224 0.107400100      5
#> 
#> $vaccine_effectiveness
#>        estimate wald_lower wald_upper wald_n
#> 30   0.46546592  0.3216548 0.57878865      5
#> 60   0.39517839  0.3201008 0.46196555      5
#> 90   0.39722334  0.2812300 0.49449794      5
#> 120  0.33948883  0.2407852 0.42536023      5
#> 150  0.26979833  0.1526535 0.37074799      5
#> 180 -0.03593427 -0.1511121 0.06771911      5
#>