Skip to contents

Given a data-matrix, computes the information-theoretic Kendall-tau-b between all samples.


  global_na = c(NA, Inf, 0),
  perspective = "global",
  scale_max = TRUE,
  diag_good = TRUE,
  include_only = NULL,
  check_timing = FALSE,
  return_matrix = TRUE



matrix or data.frame of values, samples are columns, features are rows


numeric vector that defines globally, what should be treated as NA?


how to treat missing data in denominator and ties, character


logical, should everything be scaled compared to the maximum correlation?


logical, should the diagonal entries reflect how many entries in the sample were "good"?


only run the correlations that include the members (as a vector) or combinations (as a list or data.frame)


logical to determine should we try to estimate run time for full dataset? (default is FALSE)


logical, should the data.frame or matrix result be returned?


list with cor, raw, pval, taumax


For more details, see the vignette vignette("ici-kendalltau", package = "ICIKendallTau"))

The default for global_na includes what values in the data to replace with NA for the Kendall-tau calculation. By default these are global_na = c(NA, Inf, 0). If you want to replace something other than 0, for example, you might use global_na = c(NA, Inf, -2), and all values of -2 will be replaced instead of 0.

When check_timing = TRUE, 5 random pairwise comparisons will be run to generate timings on a single core, and then estimates of how long the full set will take are calculated. The data is returned as a data.frame, and will be on the low side, but it should provide you with a good idea of how long your data will take.

Returned is a list containing matrices with:

  • cor: scaled correlations

  • raw: raw kendall-tau correlations

  • pval: p-values

  • taumax: the theoretical maximum kendall-tau value possible

Eventually, we plan to provide two more parameters for replacing values, feature_na for feature specific NA values and sample_na for sample specific NA values.

If you want to know if the missing values in your data are possibly due to left-censorship, we recommend testing that hypothesis with test_left_censorship() first.


if (FALSE) {
# not run
s1 = sort(rnorm(1000, mean = 100, sd = 10))
s2 = s1 + 10 

matrix_1 = cbind(s1, s2)

r_1 = ici_kendalltau(matrix_1)

#    s1 s2
# s1  1  1
# s2  1  1
# "cor", "raw", "pval", "taumax", "keep", "run_time"

s3 = s1
s3[sample(100, 50)] = NA

s4 = s2
s4[sample(100, 50)] = NA

matrix_2 = cbind(s3, s4)
r_2 = ici_kendalltau(matrix_2)
#           s3        s4
# s3 1.0000000 0.9944616
# s4 0.9944616 1.0000000

# using include_only
x = t(matrix(rnorm(5000), nrow = 100, ncol = 50))
colnames(x) = paste0("s", seq(1, nrow(x)))

# only calculate correlations of other columns with "s1"
include_s1 = "s1"
s1_only = ici_kendalltau(x, include_only = include_s1)

# include s1 and s3 things both
include_s1s3 = c("s1", "s3")
s1s3_only = ici_kendalltau(x, include_only = include_s1s3)

# only specify certain pairs either as a list
include_pairs = list(g1 = "s1", g2 = c("s2", "s3"))
s1_other = ici_kendalltau(x, include_only = include_pairs)

# or a data.frame
include_df = = "s1", g2 = c("s2", "s3")))
s1_df = ici_kendalltau(x, include_only = include_df)
