Skip to contents

Does a binomial test to check if the most likely cause of missing values is due to values being below the limit of detection, or coming from a left-censored distribution.

Usage

test_left_censorship(
  data_matrix,
  global_na = c(NA, Inf, 0),
  sample_classes = NULL
)

Arguments

data_matrix

matrix or data.frame of numeric data

global_na

what represents zero or missing?

sample_classes

which samples are in which class

Value

data.frame of trials / successes, and binom.test result

Details

For each feature that is missing in a group of samples, we save as a possibility to test. For each sample, we calculate the median value with any missing values removed. Each feature that had a missing value, we test whether the remaining non-missing values are below the sample median for those samples where the feature is non-missing. A binomial test considers the total number of features instances (minus missing values) as the number of trials, and the number of of features below the sample medians as the number of successes.

There is a bit more detail in the vignette: vignette("testing-for-left-censorship", package = "ICIKendallTau")

See also

Examples

# this example has 80% missing due to left-censorship
data(missing_dataset)
missingness = test_left_censorship(missing_dataset)
missingness$values
#>   trials success class
#> 1   1900    1520     A
missingness$binomial_test
#> 
#> 	Exact binomial test
#> 
#> data:  total_success and total_trials
#> number of successes = 1520, number of trials = 1900, p-value < 2.2e-16
#> alternative hypothesis: true probability of success is greater than 0.5
#> 95 percent confidence interval:
#>  0.7843033 1.0000000
#> sample estimates:
#> probability of success 
#>                    0.8 
#>