All the code for generating the manuscript:
Information-Content-Informed Kendall-tau Correlation Methodology: Interpreting Missing Values as Useful Information, Robert M Flight, Praneeth S Bhatt, Hunter NB Moseley biorxiv is contained within this repository.
The current versions can be obtained directly from this repository:
The contents of this work are licensed under a CC-BY license. If you use any content, you must give attribution to this original work.
The repository used R 4.3.0, and {renv} 1.0.0 For all of the other
packages needed, see the file renv.lock
.
To setup to be able to rerun everything here, you can clone the repo from github or download it from Zenodo, and then from within that folder:
# clone from github
git clone https://github.com/MoseleyBioinformaticsLab/manuscript.ICIKendallTau.git
# download from zenodo
wget 'https://zenodo.org/records/13881956/files/MoseleyBioinformaticsLab/manuscript.ICIKendallTau-draft_v5.zip?download=1' --output-document=manuscript.ICIKendallTau.zip
unzip manuscript.ICIKendallTau.zip
# make sure renv is installed
install.packages("renv")
# restore the packages
renv::restore()
The {ICIKendallTau} package on GitHub is now different than the one used for this manuscript, you should install the one archived on Zenodo (v 0.3.20).
wget 'https://zenodo.org/records/10580528/files/ICIKendallTau_0.3.20.tar.gz?download=1' --output-document=ICIKendallTau_0.3.20.tar.gz
tar -xzvf ICIKendallTau_0.3.20.tar.gz
cd ICIKendallTau
Then start an R session from within the directory, and install it.
remotes::install_local()
If you want to rerun everything from the beginning, you can do
tar_make()
on the project, and it should just run it.
However, some of the calculations require a lot of
compute resources, or will just take a long time, even with multiple
cores. I would definitely recommend at the very least 64 GB of RAM, and
you may need more depending on how many cores you are using. The compute
node we ran on had 80 cores, and 1 TB of RAM, and we regularly used 500
GB of RAM for some of the calculations.
targets::tar_make()
To make it easier to at least generate the manuscript, there is a
copy of the {targets} cache on Zenodo, in two parts here and here. Make
sure you have lots of room, there are 68 GB worth of files (and those
are already compressed R data files). This is in four parts, and can be
downloaded using wget
.
Having this cache, you have the state of the computations when we
submitted the manuscript, and can examine almost any of the outputs you
want by loading them using tar_load(object)
.
# make sure you are wherever you want the targets cache, like wherever
# you cloned the github repo to, at the top level of the directory.
wget 'https://zenodo.org/records/13873347/files/manuscript.ICIKendallTau.targets_cache.parts0.tar?download=1' --output-document=manuscript.ICIKendallTau.targets_cache.parts0.tar
wget 'https://zenodo.org/records/13873347/files/manuscript.ICIKendallTau.targets_cache.parts1.tar?download=1' --output-document=manuscript.ICIKendallTau.targets_cache.parts1.tar
wget 'https://zenodo.org/records/13874988/files/manuscript.ICIKendallTau.targets_cache.parts2.tar?download=1' --output-document=manuscript.ICIKendallTau.targets_cache.parts2.tar
wget 'https://zenodo.org/records/13874988/files/manuscript.ICIKendallTau.targets_cache.parts3.tar?download=1' --output-document=manuscript.ICIKendallTau.targets_cache.parts3.tar
# now untar them. Note the "k", this keeps the directory from getting overwritten by subsequent un-tar
tar -xkvf manuscript.ICIKendallTau.targets_cache.parts0.tar
tar -xkvf manuscript.ICIKendallTau.targets_cache.parts1.tar
tar -xkvf manuscript.ICIKendallTau.targets_cache.parts2.tar
tar -xkvf manuscript.ICIKendallTau.targets_cache.parts3.tar