Title: | Weights to Correct for Outcome Dependent Sampling in Time to Event Data |
---|---|
Description: | A new inverse probability of selection weighted Cox model to deal with outcome-dependent sampling in survival analysis. |
Authors: | Vera Arntzen [aut, cre] |
Maintainer: | Vera Arntzen <[email protected]> |
License: | GPL-3 |
Version: | 1.0 |
Built: | 2025-02-13 04:43:16 UTC |
Source: | https://github.com/vharntzen/wcox |
This function calculates weights to correct for ascertainment bias in time-to-event data where clusters are outcome-dependently sampled, for example high-risk families in genetic epidemiological studies in cancer research.
Calculate_weights(dat)
Calculate_weights(dat)
dat |
Data.frame with one row per individual with columns d non-censoring indicator; k interval of (age) group; S_k population interval-based proportion of individuals experiencing the event in intervals later than k; S_k. sample proportion of individuals experiencing the event in intervals later than k. |
Weights are based on a comparison between the survival between sample and population. Therefore, besides the sample data, the population incidence rate (per 100 000) is needed as input, as well as the cut-offs of the (age/time-to-event) groups for which this is available. The function provides two options for the latter: cut-offs can be provided manually or using the standard 5- or 10-years (age) categories (0-4, 5-9, ... or 0-9, 10-14, ...). Note that resulting intervals are of the form [xx, xx).
Vector with weights.
This toy data set is simulated for educational purposes explaining the package. It concerns of families in which at least two individuals experienced the event of interest during following up. The covariate of interest is risk modifier 'x'. This data set is inspired by data that is often seen in genetic epidemiological studies in cancer research.
data(fam_dat)
data(fam_dat)
Data.frame.
A new inverse probability of selection weighted Cox model to deal with outcome-dependent sampling in survival analysis Vera H. Arntzen, Marta Fiocco, Inge M.M. Lakeman, Maartje Nielsen, Mar RodrÃguez-Girondo doi: https://doi.org/10.1101/2023.02.07.527426 (preprint).
This function prepares the sample data for weight calculation using external information, i.e. the incidence in the population.
Prepare_data(dat, population_incidence, breaks)
Prepare_data(dat, population_incidence, breaks)
dat |
Data.frame with one row per individual which at least includes a column d with event indicator (1 for event, 0 for censored), a column y with event/censoring time. |
population_incidence |
A vector (in combination with breaks) or a data.frame (columns 1) 'start age group', 2) 'end age group', 3)'S_pop') with population incidence per 100,000 per interval k. |
breaks |
Cut-points for the (age/time) groups. Only needed when population_incidence is a vector. |
Weights are based on a comparison between the survival between sample and population. Therefore, besides the sample data, the population incidence rate (per 100 000) is needed as input, as well as the cut-offs of the (age/time-to-event) groups for which this is available, unless this is provided in a data.frame.
Data.frame ready for weight calculation using function 'Calculate_weights()': one row per individual and a.o. columns id unique ID; d non-censoring indicator; k interval of (age) group; S_k population interval-based proportion of individuals experiencing the event in intervals later than k; S_k. sample proportion of individuals experiencing the event in intervals later than k.