reorder by sample class — similarity_reorderbyclass • visualizationQualityControl

to avoid spurious visualization problems, it is useful in a heatmap visualization to reorder the samples within each sample class. This function uses hierarchical clustering and dendsort to sort entries in a distance matrix.

Usage

similarity_reorderbyclass(
  similarity_matrix,
  sample_classes = NULL,
  transform = "none",
  hclust_method = "complete",
  dendsort_type = "min"
)

Arguments

similarity_matrix: matrix of similarities between objects
sample_classes: data.frame or factor denoting classes
transform: a transformation to apply to the data
hclust_method: which method for clustering should be used
dendsort_type: how should dendsort do reordering?

Value

a list containing the reordering of the matrix in a:

dendrogram
numeric vector
character vector (will be NULL if rownames are not set on the matrix)

Details

The similarity_matrix should be either a square matrix of similarity values or a distance matrix of class dist. If your matrix does not encode a "true" distance, you can use a transform to turn it into a true distance (for example, if you have correlation, then a distance would be 1 - correlation, use "sub_1" as the transform argument).

The sample_classes should be either a data.frame or factor argument. If a data.frame is passed, all columns of the data.frame will be pasted together to create a factor for splitting the data into groups. If the rownames of the data.frame do not correspond to the rownames or colnames of the matrix, then it is assumed that the ordering in the matrix and the data.frame are identical.

Examples

library(visualizationQualityControl)
set.seed(1234)
mat <- matrix(rnorm(100, 2, sd = 0.5), 10, 10)
rownames(mat) <- colnames(mat) <- letters[1:10]
neworder <- similarity_reorderbyclass(mat)
mat[neworder$indices, neworder$indices]
#>           a         d        g        b        c        f        e        h
#> a 1.3964671 2.5511488 2.328294 1.761404 2.067044 1.096984 2.724748 2.003446
#> d 0.8271511 1.7493710 1.665183 2.032229 2.229795 1.492519 1.859688 2.324143
#> g 1.7126300 0.9099802 1.430696 1.744495 2.287378 2.823909 1.446341 1.304650
#> b 2.1387146 1.7622035 3.274496 1.500807 1.754657 1.708962 1.465679 1.772266
#> c 2.5422206 1.6452800 1.982620 1.611873 1.779726 1.445555 1.572318 1.816738
#> f 2.2530279 1.4161904 2.888542 1.944857 1.275898 2.281528 1.515743 1.923301
#> e 2.2145623 1.1854533 1.996198 2.479747 1.653140 1.918845 1.502830 3.035135
#> h 1.7266841 1.3295034 2.683914 1.544402 1.488172 1.613323 1.374007 1.638209
#> i 1.7177740 1.8528531 2.664782 1.581414 1.992431 2.802955 1.738086 2.129131
#> j 1.5549811 1.7670512 2.168236 3.207918 1.532026 1.421096 1.751575 1.841470
#>          i        j
#> a 1.911105 1.973421
#> d 1.913106 2.500757
#> g 2.274999 1.432696
#> b 1.915003 2.127598
#> c 1.313849 2.852982
#> f 2.348804 2.177775
#> e 2.425116 1.752208
#> h 1.798634 2.439102
#> i 1.904203 2.486458
#> j 1.402736 3.060559

sample_class <- data.frame(grp = rep(c("grp1", "grp2"), each = 5), stringsAsFactors = FALSE)
rownames(sample_class) <- rownames(mat)
neworder2 <- similarity_reorderbyclass(mat, sample_class[, "grp", drop = FALSE])

# if there is a class with only one member, it is dropped, with a warning
sample_class[10, "grp"] = "grp3"
neworder3 <- similarity_reorderbyclass(mat, sample_class[, "grp", drop = FALSE])
#> Warning: Removing groups: grp3
neworder3$indices # 10 should be missing
#> [1] 1 4 5 2 3 6 8 7 9

mat[neworder2$indices, neworder2$indices]
#>           a         d        e        b        c        i        j        g
#> a 1.3964671 2.5511488 2.724748 1.761404 2.067044 1.911105 1.973421 2.328294
#> d 0.8271511 1.7493710 1.859688 2.032229 2.229795 1.913106 2.500757 1.665183
#> e 2.2145623 1.1854533 1.502830 2.479747 1.653140 2.425116 1.752208 1.996198
#> b 2.1387146 1.7622035 1.465679 1.500807 1.754657 1.915003 2.127598 3.274496
#> c 2.5422206 1.6452800 1.572318 1.611873 1.779726 1.313849 2.852982 1.982620
#> i 1.7177740 1.8528531 1.738086 1.581414 1.992431 1.904203 2.486458 2.664782
#> j 1.5549811 1.7670512 1.751575 3.207918 1.532026 1.402736 3.060559 2.168236
#> g 1.7126300 0.9099802 1.446341 1.744495 2.287378 2.274999 1.432696 1.430696
#> f 2.2530279 1.4161904 1.515743 1.944857 1.275898 2.348804 2.177775 2.888542
#> h 1.7266841 1.3295034 1.374007 1.544402 1.488172 1.798634 2.439102 2.683914
#>          f        h
#> a 1.096984 2.003446
#> d 1.492519 2.324143
#> e 1.918845 3.035135
#> b 1.708962 1.772266
#> c 1.445555 1.816738
#> i 2.802955 2.129131
#> j 1.421096 1.841470
#> g 2.823909 1.304650
#> f 2.281528 1.923301
#> h 1.613323 1.638209
cbind(neworder$names, neworder2$names)
#>       [,1] [,2]
#>  [1,] "a"  "a" 
#>  [2,] "d"  "d" 
#>  [3,] "g"  "e" 
#>  [4,] "b"  "b" 
#>  [5,] "c"  "c" 
#>  [6,] "f"  "i" 
#>  [7,] "e"  "j" 
#>  [8,] "h"  "g" 
#>  [9,] "i"  "f" 
#> [10,] "j"  "h"