cola Report for GDS4238

The density distribution for each sample is visualized as in one column in the following heatmap. The clustering is based on the distance which is the Kolmogorov-Smirnov statistic between two distributions.

Suggest the best k

Folowing table shows the best k (number of partitions) for each combination of top-value methods and partition methods. Clicking on the method name in the table goes to the section for a single combination of methods.

The cola vignette explains the definition of the metrics used for determining the best number of partitions.

CDF of consensus matrices

Consensus heatmap

Membership heatmap

Signature heatmap

Statistics table

	The best k	1-PAC	Mean silhouette	Concordance		Optional k
SD:NMF	2	1.000	0.969	0.988	**
MAD:kmeans	2	1.000	0.996	0.998	**
ATC:kmeans	2	1.000	1.000	1.000	**
ATC:pam	2	1.000	0.997	0.998	**
ATC:mclust	3	1.000	0.973	0.986	**	2
ATC:skmeans	4	0.964	0.934	0.963	**	2,3
MAD:NMF	3	0.958	0.947	0.977	**	2
MAD:skmeans	6	0.955	0.912	0.920	**	2,3,4,5
ATC:NMF	3	0.953	0.936	0.971	**	2
SD:skmeans	6	0.949	0.882	0.898	*	2,5
CV:skmeans	6	0.941	0.916	0.924	*	2,5
SD:pam	5	0.928	0.887	0.953	*	2,4
CV:NMF	3	0.921	0.916	0.965	*	2
MAD:pam	5	0.919	0.846	0.938	*	2
CV:pam	5	0.918	0.899	0.955	*	2,3,4
MAD:hclust	3	0.910	0.887	0.961	*	2
ATC:hclust	6	0.900	0.879	0.912	*	2,3,4
MAD:mclust	2	0.896	0.966	0.984
SD:hclust	5	0.764	0.837	0.889
CV:mclust	3	0.740	0.872	0.919
CV:hclust	4	0.715	0.708	0.836
SD:mclust	2	0.673	0.912	0.939
SD:kmeans	2	0.548	0.947	0.939
CV:kmeans	2	0.511	0.914	0.903

The statistics used for measuring the stability of consensus partitioning. (How are they defined?)

Following heatmap plots the partition for each combination of methods and the lightness correspond to the silhouette scores for samples in each method. On top the consensus subgroup is inferred from all methods by taking the mean silhouette scores as weight.

Partition from all methods

Top rows overlap

Also visualize the correspondance of rankings between different top-row methods:

Test to known annotations

Test correlation between subgroups and known annotations. If the known annotation is numeric, one-way ANOVA test is applied, and if the known annotation is discrete, chi-squared contingency table test is applied.

Results for each method

SD:hclust

The object with results only for a single top-value method and a single partition method can be extracted as:

collect_plots() function collects all the plots made from res for all k (number of partitions) into one single page to provide an easy and fast comparison between different k.

All the plots in panels can be made by individual functions and they are plotted later in this section.

select_partition_number() produces several plots showing different statistics for choosing “optimized” k. There are following statistics:

The detailed explanations of these statistics can be found in the cola vignette.

Generally speaking, lower PAC score, higher mean silhouette score or higher concordance corresponds to better partition. Rand index and Jaccard index measure how similar the current partition is compared to partition with k-1. If they are too similar, we won't accept k is better than k-1.

suggest_best_k() suggests the best \(k\) based on these statistics. The rules are as follows:

Following shows the table of the partitions (You need to click the show/hide code output link to see it). The membership matrix (columns with name p*) is inferred by clue::cl_consensus() function with the SE method. Basically the value in the membership matrix represents the probability to belong to a certain group. The finall class label for an item is determined with the group with highest probability it belongs to.

In get_classes() function, the entropy is calculated from the membership matrix and the silhouette score is calculated from the consensus matrix.

cola Report for GDS4238

Summary

Density distribution

Suggest the best k

CDF of consensus matrices

Consensus heatmap

Membership heatmap

Signature heatmap

Statistics table

Partition from all methods

Top rows overlap

Test to known annotations

Results for each method

SD:hclust

SD:kmeans

SD:skmeans*

SD:pam*

SD:mclust

SD:NMF**

CV:hclust

CV:kmeans

CV:skmeans*

CV:pam*

CV:mclust

CV:NMF*

MAD:hclust*

MAD:kmeans**

MAD:skmeans**

MAD:pam*

MAD:mclust

MAD:NMF**

ATC:hclust*

ATC:kmeans**

ATC:skmeans**

ATC:pam**

ATC:mclust**

ATC:NMF**

Session info