Does a binomial test to check if the most likely cause of missing values is due to values being below the limit of detection, or coming from a left-censored distribution.
Usage
test_left_censorship(
data_matrix,
global_na = c(NA, Inf, 0),
sample_classes = NULL
)
Details
For each feature that is missing in a group of samples, we save as a possibility to test. For each sample, we calculate the median value with any missing values removed. Each feature that had a missing value, we test whether the remaining non-missing values are below the sample median for those samples where the feature is non-missing. A binomial test considers the total number of features instances (minus missing values) as the number of trials, and the number of of features below the sample medians as the number of successes.
There is a bit more detail in the vignette: vignette("testing-for-left-censorship", package = "ICIKendallTau")
Examples
# this example has 80% missing due to left-censorship
data(missing_dataset)
missingness = test_left_censorship(missing_dataset)
missingness$values
#> trials success class
#> 1 1900 1520 A
missingness$binomial_test
#>
#> Exact binomial test
#>
#> data: total_success and total_trials
#> number of successes = 1520, number of trials = 1900, p-value < 2.2e-16
#> alternative hypothesis: true probability of success is greater than 0.5
#> 95 percent confidence interval:
#> 0.7843033 1.0000000
#> sample estimates:
#> probability of success
#> 0.8
#>