Calculates the fraction of entries in each sample that are more than X
standard deviations from the trimmed mean. See Details.
Arguments
- data
the data matrix (samples are columns, rows are features)
- sample_classes
the sample classes
- n_trim
how many features to trim at each end (default is 3)
- n_sd
how many SD before treated as outlier (default is 5)
- remove_missing
what missing values be removed before calculating? (default is NA)
Details
Based on the Gerlinski paper link
for each feature (in a sample class), take the range across all the samples,
remove the n_trim lowest and highest values, and calculate the mean
and sd, and the actual upper and lower ranges of n_sd from the
mean. For each sample and feature, determine if within or outside
that limit. Fraction is reported as the number of features outside the range.
Returns a data.frame with:
- sample_id
the sample id,
rownamesare used if available, otherwise this is an index- sample_class
the class of the sample if
sample_classeswere provided, otherwise given a default of "C1"- frac
the actual outlier fraction calculated for that sample