0% found this document useful (0 votes)

11 views29 pages

Consensus Clustering

The 'ConsensusClustering' package, version 1.5.0, provides tools for consensus clustering in bioinformatics to identify groups of similar biological data points. It includes various functions for converting adjacency matrices, counting clusters based on stability scores, and relabeling clusters based on similarities. The package is maintained by Behnam Yousefi and is available on CRAN under the GPL license.

Uploaded by

dna.informatist

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views29 pages

Consensus Clustering

Uploaded by

dna.informatist

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

Package ‘ConsensusClustering’

July 30, 2024

Type Package
Title Consensus Clustering
Version 1.5.0
Description Clustering, or cluster analysis, is a widely used technique in bioinformatics to iden-
tify groups of similar biological data points. Consensus clustering is an extension to cluster-
ing algorithms that aims to construct a robust result from those clustering features that are invari-
ant under different sources of variation. For the reference, please cite the following pa-
per: Yousefi, Melograna, et. al., (2023) <doi:10.3389/fmicb.2023.1170391>.
License GPL (>= 3)
Encoding UTF-8
RoxygenNote 7.3.2
Imports assertthat, dplyr, igraph, cluster, mvtnorm, utils, graphics,
stats
NeedsCompilation no
Author Behnam Yousefi [aut, cre, cph]
Maintainer Behnam Yousefi <yousefi.bme@gmail.com>
Repository CRAN
Date/Publication 2024-07-30 08:00:02 UTC

Contents
adj_conv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
adj_mat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
cc_cluster_count . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
cluster_relabel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
coCluster_matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
connectivity_matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
consensus_matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
consensus_matrix_data_prtrb . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
consensus_matrix_multiview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
gaussian_clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
gaussian_clusters_with_param . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1
2 adj_conv

gaussian_mixture_clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
generate_data_prtrb . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
generate_gaussian_data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
generate_method_prtrb . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
generate_multiview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
hir_clust_from_adj_mat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
indicator_matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
label_similarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Logit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
majority_voting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
multiview_clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
multiview_cluster_gen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
multiview_kmeans_gen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
multiview_pam_gen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
multi_cluster_gen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
multi_kmeans_gen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
multi_pam_gen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
pam_clust_from_adj_mat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
spect_clust_from_adj_mat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

Index 29

adj_conv Convert adjacency function to the affinity matrix

Description
Convert adjacency function to the affinity matrix

Usage
adj_conv(adj.mat, alpha = 1)

Arguments
adj.mat Adjacency matrix. The elements must be within [-1, 1].
alpha soft threshold value (see details).

Details
adj = exp(-(1-adj)^2/(2*alpha^2)) ref: Luxburg (2007), "A tutorial on spectral clustering", Stat
Comput

Value
the matrix if affinity values.
adj_mat 3

Examples
Adj_mat = rbind(c(0.0,0.9,0.0),
c(0.9,0.0,0.2),
c(0.0,0.2,0.0))
adj_conv(Adj_mat)

adj_mat Covert data matrix to adjacency matrix

Description
Covert data matrix to adjacency matrix

Usage
adj_mat(X, method = "euclidian")

Arguments
X a matrix of samples by features.
method method for distance calculation: "euclidian", "cosine", "maximum", "manhattan",
"canberra", "binary", "minkowski",

Value
calculated adjacency matrix from the data matrix using the specified methods

Examples
X = gaussian_clusters()$X
Adj = adj_mat(X, method = "euclidian")

cc_cluster_count Count the number of clusters based on stability score.

Description
Count the number of clusters based on stability score.

Usage
cc_cluster_count(CM, plot.cdf = TRUE, plot.logit = FALSE)
4 cluster_relabel

Arguments
CM list of consensus matrices each for a specific number of clusters. It can be the
output of consensus_matrix() and multiview_consensus_matrix() func-
tions.
plot.cdf binary value to plot the cumulative distribution functions of CM (default TRUE).
plot.logit binary value to plot the logit model of cumulative distribution functions of CM
(default FALSE).

Details
Count the number of clusters given a list of consensus matrices each for a specific number of
clusters. Using different methods: "LogitScore", "PAC", "deltaA", "CMavg"

Value
results as a list: "LogitScore", "PAC", "deltaA", "CMavg", "Kopt_LogitScore", "Kopt_PAC",
"Kopt_deltaA", "Kopt_CMavg"

Examples
X = gaussian_clusters()$X
Adj = adj_mat(X, method = "euclidian")
CM = consensus_matrix(Adj, max.cluster=3, max.itter=10)
Result = cc_cluster_count(CM, plot.cdf=FALSE)

cluster_relabel Relabeling clusters based on cluster similarities

Description
Relabeling clusters based on cluster similarities

Usage
cluster_relabel(x1, x2)

Arguments
x1 clustering vector 1 Zero elements are are considered as unclustered samples
x2 clustering vector 2 Zero elements are are considered as unclustered samples

Details
When performing performing several clustering, the cluster labels may no match with each other.
To perform maximum voting, the clustering need to be relabels based on label similarities.
coCluster_matrix 5

Value
dataframe of relabeled clusters

Examples
X = gaussian_clusters()$X
x1 = kmeans(X, 5)$cluster
x2 = kmeans(X, 5)$cluster
clusters = cluster_relabel(x1, x2)

coCluster_matrix Calculate the Co-cluster matrix for a given set of clustering results.

Description
Calculate the Co-cluster matrix for a given set of clustering results.

Usage
coCluster_matrix(X, verbos = TRUE)

Arguments
X clustering matrix of Nsamples x Nclusterings. Zero elements are are considered
as unclustered samples
verbos binary value for verbosity (default = TRUE)

Details
Co-cluster matrix or consensus matrix (CM) is a method for consensus mechanism explaned in
Monti et al. (2003).

Value
The normalized matrix of Co-cluster frequency of any pairs of samples (Nsamples x Nsamples)

Examples
Clustering = cbind(c(1,1,1,2,2,2),
c(1,1,2,1,2,2))
coCluster_matrix(Clustering, verbos = FALSE)
6 consensus_matrix

connectivity_matrix Build connectivity matrix

Description

Build connectivity matrix

Usage

connectivity_matrix(clusters)

Arguments

clusters a vector of clusterings. Zero elements mean that the sample was absent during
clustering

Details

Connectivity matrix (M) is a binary matrix N-by-N M[i,j] = 1 if sample i and j are in the same
cluster ref: Monti et al. (2003) "Consensus Clustering: A Resampling-Based Method for Class
Discovery and Visualization of Gene Expression Microarray Data", Machine Learning

Value

Connectivity matrix

Examples

con_mat = connectivity_matrix(c(1,1,1,2,2,2))

consensus_matrix Calculate consensus matrix for data perturbation consensus cluster-

ing

Description

Calculate consensus matrix for data perturbation consensus clustering

consensus_matrix 7

Usage

consensus_matrix(
X,
max.cluster = 5,
resample.ratio = 0.7,
max.itter = 100,
clustering.method = "hclust",
adj.conv = TRUE,
verbos = TRUE
)

Arguments

X adjacency matrix a Nsample x Nsample

max.cluster maximum number of clusters
resample.ratio the data ratio to use at each itteration.
max.itter maximum number of itterations at each max.cluster
clustering.method
base clustering method: c("hclust", "spectral", "pam")
adj.conv binary value to apply soft thresholding (default=TRUE)
verbos binary value for verbosity (default=TRUE)

Details

performs data perturbation consensus clustering and obtain consensus matrix Monti et al. (2003)
consensus clustering algorithm This function will be removed in the future release and is replaced
by consensus_matrix_data_prtrb()

Value

list of consensus matrices for each k

Examples

X = gaussian_clusters()$X
Adj = adj_mat(X, method = "euclidian")
CM = consensus_matrix(Adj, max.cluster=3, max.itter=10, verbos = FALSE)
8 consensus_matrix_data_prtrb

consensus_matrix_data_prtrb
Calculate consensus matrix for data perturbation consensus cluster-
ing

Description
Calculate consensus matrix for data perturbation consensus clustering

Usage
consensus_matrix_data_prtrb(
X,
max.cluster = 5,
resample.ratio = 0.7,
max.itter = 100,
clustering.method = "hclust",
adj.conv = TRUE,
verbos = TRUE
)

Arguments
X adjacency matrix a Nsample x Nsample
max.cluster maximum number of clusters
resample.ratio the data ratio to use at each itteration.
max.itter maximum number of itterations at each max.cluster
clustering.method
base clustering method: c("hclust", "spectral", "pam")
adj.conv binary value to apply soft thresholding (default=TRUE)
verbos binary value for verbosity (default=TRUE)

Details
performs data perturbation consensus clustering and obtain consensus matrix Monti et al. (2003)
consensus clustering algorithm

Value
list of consensus matrices for each k

Examples
X = gaussian_clusters()$X
Adj = adj_mat(X, method = "euclidian")
CM = consensus_matrix_data_prtrb(Adj, max.cluster=3, max.itter=10, verbos = FALSE)
consensus_matrix_multiview 9

consensus_matrix_multiview
Calculate consensus matrix for multi-data consensus clustering

Description
Calculate consensus matrix for multi-data consensus clustering

Usage
consensus_matrix_multiview(
X,
max.cluster = 5,
sample.set = NA,
clustering.method = "hclust",
adj.conv = TRUE,
verbos = TRUE
)

Arguments
X list of adjacency matrices for different cohorts (or views).
max.cluster maximum number of clusters
sample.set vector of samples the clustering is being applied on. sample.set can be names
or indices. if sample.set is NA, it considers that all the datasets have the same
samples with the same order.
clustering.method
base clustering method: c("hclust", "spectral", "pam")
adj.conv binary value to apply soft threshold (default=TRUE)
verbos binary value for verbosity (default=TRUE)

Details
performs multi-data consensus clustering and obtain consensus matrix Monti et al. (2003) consen-
sus clustering algorithm

Value
description list of consensus matrices for each k

for (i in 1:length(X_observation))
Adj[[i]] = adj_mat(X_observation[[i]], method = "euclidian")
CM = consensus_matrix_multiview(Adj, max.cluster = 4, verbos = FALSE)

gaussian_clusters Generate clusters of data points from Gaussian distribution with ran-
domly generated parameters

Description
Generate clusters of data points from Gaussian distribution with randomly generated parameters

Usage
gaussian_clusters(
n = c(50, 50),
dim = 2,
sd.max = 0.1,
sd.noise = 0.01,
r.range = c(0.1, 1)
)

Arguments
n vector of number of data points in each cluster The length of n should be equal
to the number of clusters.
dim number of dimensions
sd.max maximum standard deviation of clusters
sd.noise standard deviation of the added noise
r.range the range (min, max) of distance of cluster centers from the origin

Value
a list of data points (X) and cluster labels (class)

Examples
data = gaussian_clusters()
X = data$X
y = data$class
gaussian_clusters_with_param 11

gaussian_clusters_with_param
Generate clusters of data points from Gaussian distribution with given
parameters

Description
Generate clusters of data points from Gaussian distribution with given parameters

Usage
gaussian_clusters_with_param(n, center, sigma)

Arguments
n vector of number of data points in each cluster The length of n should be equal
to the number of clusters.
center matrix of centers Ncluster x dim
sigma list of covariance matrices dim X dim. The length of sigma should be equal to
the number of clusters.

Value
matrix of Nsamples x (dim + 1). The last column is cluster labels.

Examples
center = rbind(c(0,0),
c(1,1))
sigma = list(diag(c(1,1)),
diag(2,2))
gaussian_clusters_with_param(c(10, 10), center, sigma)

gaussian_mixture_clusters
Generate clusters of data points from Gaussian-mixture-model distri-
butions with randomly generated parameters

Description
Generate clusters of data points from Gaussian-mixture-model distributions with randomly gener-
ated parameters
12 generate_data_prtrb

Usage

gaussian_mixture_clusters(
n = c(50, 50),
dim = 2,
sd.max = 0.1,
sd.noise = 0.01,
r.range = c(0.1, 1),
mixture.range = c(1, 4),
mixture.sep = 0.5
)

Arguments

n vector of number of data points in each cluster The length of n should be equal
to the number of clusters.
dim number of dimensions
sd.max maximum standard deviation of clusters
sd.noise standard deviation of the added noise
r.range the range (min, max) of distance of cluster centers from the origin
mixture.range range (min, max) of the number of Gaussian-mixtures.
mixture.sep scaler indicating the separability between the mixtures.

Value

a list of data points (X) and cluster labels (class)

Examples

data = gaussian_mixture_clusters()
X = data$X
y = data$class

generate_data_prtrb Generation mechanism for data perturbation consensus clustering

Description

Generation mechanism for data perturbation consensus clustering

generate_data_prtrb 13

Usage

generate_data_prtrb(
X,
cluster.method = "pam",
k = 3,
resample.ratio = 0.7,
rep = 10,
distance.method = "euclidian",
adj.conv = TRUE,
func
)

Arguments

X input data Nsample x Nfeatures

cluster.method base clustering method: c("hclust", "spectral", "pam", "custom")
k number of clusters
resample.ratio the data ratio to use at each itteration.
rep maximum number of itterations at each max.cluster
distance.method
method for distance calculation: "euclidian", "cosine", "maximum", "manhattan",
"canberra", "binary", "minkowski".
adj.conv binary value to apply soft thresholding (default=TRUE)
func user-definrd function required if cluster.method = "custom". The function
needs two inputs of X and k

Details

Performs clustering on the purturbed samples set Monti et al. (2003) consensus clustering algorithm

Value

matrix of clusterings Nsample x Nrepeat

Examples

X = gaussian_clusters()$X
Clusters = generate_data_prtrb(X)
14 generate_method_prtrb

generate_gaussian_data
Generate a set of data points from Gaussian distribution

Description
Generate a set of data points from Gaussian distribution

Usage
generate_gaussian_data(n, center = 0, sigma = 1, label = NA)

Arguments
n number of generated data points
center data center of desired dimension
sigma covariance matrix
label cluster label

Value
Generated data points from Gaussian distribution with given parameters

Examples
generate_gaussian_data(10, center=c(0,0), sigma=diag(c(1,1)), label=1)

generate_method_prtrb Multiple method generation

Description
Multiple method generation

Usage
generate_method_prtrb(
X,
cluster.method = "pam",
range.k = c(2, 5),
sample.k.method = "random",
rep = 10,
distance.method = "euclidian",
func
)
generate_multiview 15

Arguments
X input data Nsample x Nfeatures
cluster.method base clustering method: c("kmeans", "pam", "custom")
range.k vector of minimum and maximum values for k c(min, max)
sample.k.method
method for the choice of k at each repeat c("random", "silhouette")
rep number of repeats
distance.method
method for distance calculation: "euclidian", "maximum", "manhattan", "canberra",
"binary", "minkowski".
func user-definrd function required if cluster.method = "custom". The function
needs two inputs of X and k.

Details
At each repeat, k is selected randomly or based on the best silhouette width from a discrete uniform
distribution between range.k[1] and range.k[2]. Then clustering is applied and result is returned.

Value
matrix of clusterings Nsample x Nrepeat

Examples
X = gaussian_clusters()$X
Clusters = generate_method_prtrb(X)

generate_multiview Multiview generation

Description
Multiview generation

Usage
generate_multiview(
X,
cluster.method = "pam",
range.k = c(2, 5),
sample.k.method = "random",
rep = 10,
distance.method = "euclidian",
sample.set = NA,
func
)
16 hir_clust_from_adj_mat

Arguments

X list of input data matrices of Sample x feature or distance matrices. The length
of X is equal to Nviews
cluster.method base clustering method: c("kmeans", "pam", "custom")
range.k vector of minimum and maximum values for k c(min, max)
sample.k.method
method for the choice of k at each repeat c("random", "silhouette")
rep number of repeats
distance.method
method for distance calculation: "euclidian", "maximum", "manhattan", "canberra",
"binary", "minkowski".
sample.set vector of samples the clustering is being applied on. can be names or indices. If
sample.set is NA, it considers all the datasets have the same samples with the
same order
func user-definrd function required if cluster.method = "custom". The function
needs two inputs of X and k.

Details

At each repeat, k is selected randomly or based on the best silhouette width from a discrete uniform
distribution between range.k[1] and range.k[2]. Then clustering is applied and result is returned.

Value

matrix of clusterings Nsample x Nrepeat

hir_clust_from_adj_mat
Hierarchical clustering from adjacency matrix

Description

Hierarchical clustering from adjacency matrix

indicator_matrix 17

Usage
hir_clust_from_adj_mat(
adj.mat,
k = 2,
alpha = 1,
adj.conv = TRUE,
method = "ward.D"
)

Arguments
adj.mat adjacency matrix
k number of clusters (default=2)
alpha soft threshold (considered if adj.conv = TRUE) (default=1)
adj.conv binary value to apply soft thresholding (default=TRUE)
method distance method (default: ward.D)

Details
apply PAM (k-medoids) clustering on the adjacency matrix

Value
vector of clusters

Examples
Adj_mat = rbind(c(0.0,0.9,0.0),
c(0.9,0.0,0.2),
c(0.0,0.2,0.0))
hir_clust_from_adj_mat(Adj_mat)

indicator_matrix Build indicator matrix

Description
Build indicator matrix

Usage
indicator_matrix(clusters)

Arguments
clusters a vector of clusterings. Zero elements mean that the sample was absent during
clustering
18 label_similarity

Details
Indicator matrix (I) is a binary matrix N-by-N I[i,j] = 1 if sample i and j co-exist for clustering ref:
Monti et al. (2003) "Consensus Clustering: A Resampling-Based Method for Class Discovery and
Visualization of Gene Expression Microarray Data", Machine Learning

Value
Indicator matrix

Examples
ind_mat = indicator_matrix(c(1,1,1,0,0,1))

label_similarity Similarity between different clusters

Description
Similarity between different clusters

Usage
label_similarity(x1, x2)

Arguments
x1 clustering vector 1 Zero elements are are considered as unclustered samples
x2 clustering vector 2 Zero elements are are considered as unclustered samples

Details
When performing several clustering, the cluster labels may not match with each other. To find
correspondences between clusters, the similarity between different labels will be calculated.

Value
matrix of similarities between clustering labels

Examples
X = gaussian_clusters()$X
x1 = kmeans(X, 5)$cluster
x2 = kmeans(X, 5)$cluster
Sim = label_similarity(x1, x2)
Logit 19

Logit Logit function

Description
Logit function

Usage
Logit(x)

Arguments
x numerical scaler input

Value
Logit(x) = log(1*x/(1-x))

Examples
y = Logit(0.5)

majority_voting Consensus mechanism based on majority voting

Description
Consensus mechanism based on majority voting

Usage
majority_voting(X)

Arguments
X clustering matrix of Nsamples x Nclusterings. Zero elements are are considered
as unclustered samples

Details
Perform majority voting as a consensus mechanism.

Value
the vector of consensus clustering result
20 multiview_clusters

Examples
X = gaussian_clusters()$X
x1 = kmeans(X, 5)$cluster
x2 = kmeans(X, 5)$cluster
x3 = kmeans(X, 5)$cluster
clusters = majority_voting(cbind(x1,x2,x3))

multiview_clusters Generate multiview clusters from Gaussian distributions with ran-

domly generated parameters

Description
Generate multiview clusters from Gaussian distributions with randomly generated parameters

Usage
multiview_clusters(
n = c(50, 50),
hidden.dim = 2,
observed.dim = c(2, 2, 3),
sd.max = 0.1,
sd.noise = 0.01,
hidden.r.range = c(0.1, 1)
)

Arguments
n vector of number of data points in each cluster The length of n should be equal
to the number of clusters.
hidden.dim scaler value of dimensions of the hidden state
observed.dim vector of number of dimensions of the generate clusters. The length of observed.dim
should be equal to the number of clusters.
sd.max maximum standard deviation of clusters
sd.noise standard deviation of the added noise
hidden.r.range the range (min, max) of distance of cluster centers from the origin in the hidden
space.

Value
a list of data points (X) and cluster labels (class)

Examples
data = multiview_clusters()
multiview_cluster_gen 21

multiview_cluster_gen Multiview cluster generation

Description
Multiview cluster generation

Usage
multiview_cluster_gen(
X,
func,
rep = 10,
param,
is.distance = FALSE,
sample.set = NA
)

Arguments
X List of input data matrices of Sample x feature or distance matrices. The length
of X is equal to Nviews
func custom function that accepts X and a parameter that return a vector of clusterings.
cluster_func <- function(X, param)
rep number of repeats
param vector of parameters
is.distance binary balue indicating if the input X[i] is distance
sample.set vector of samples the clustering is being applied on. can be names or indices. if
sample.set is NA, it considers all the datasets have the same samples with the
same order

Value
matrix of clusterings Nsample x (Nrepeat x Nviews)

Examples
data = multiview_clusters (n = c(40,40,40), hidden.dim = 2, observed.dim = c(2,2,2),
sd.max = .1, sd.noise = 0, hidden.r.range = c(.5,1))
X_observation = data[["observation"]]
cluster_func = function(X,rep,param){return(multi_kmeans_gen(X,rep=rep,range.k=param))}
Clusters = multiview_cluster_gen(X_observation, func = cluster_func, rep = 10, param = c(2,4))
22 multiview_kmeans_gen

multiview_kmeans_gen Multiview K-means generation

Description

Multiview K-means generation

Usage

multiview_kmeans_gen(X, rep = 10, range.k = c(2, 5), method = "random")

Arguments

X List of input data matrices of Sample x feature. The length of X is equal to

Nviews
rep number of repeats
range.k vector of minimum and maximum values for k c(min, max)
method method for the choice of k at each repeat c("random", "silhouette")

Details

At each repeat, k is selected randomly or based on the best silhouette width from a discrete uniform
distribution between range.k[1] and range.k[2]. Then k-means clustering is applied and result is
returned.

Value

matrix of clusterings Nsample x (Nrepeat x Nviews)

Examples

data = multiview_clusters (n = c(40,40,40), hidden.dim = 2, observed.dim = c(2,2,2),

sd.max = .1, sd.noise = 0, hidden.r.range = c(.5,1))
X_observation = data[["observation"]]
Clusters = multiview_kmeans_gen(X_observation)
multiview_pam_gen 23

multiview_pam_gen Multiview PAM (K-medoids) generation

Description
Multiview PAM (K-medoids) generation

Usage
multiview_pam_gen(
X,
rep = 10,
range.k = c(2, 5),
is.distance = FALSE,
method = "random",
sample.set = NA
)

Arguments
X List of input data matrices of Sample x feature or distance matrices. The length
of X is equal to Nviews
rep number of repeats
range.k vector of minimum and maximum values for k c(min, max)
is.distance binary balue indicating if the input X is distance
method method for the choice of k at each repeat c("random", "silhouette")
sample.set vector of samples the clustering is being applied on. can be names or indices. if
sample.set is NA, it considers all the datasets have the same samples with the
same order

Details
At each repeat, k is selected randomly or based on the best silhouette width from a discrete uni-
form distribution between range.k[1] and range.k[2]. Then PAM clustering is applied and result is
returned.

Value
matrix of clusterings Nsample x (Nrepeat x Nviews)

multi_cluster_gen Multiple cluster generation

Description

Multiple cluster generation

Usage

multi_cluster_gen(X, func, rep = 10, param, method = "random")

Arguments

X input data Nsample x Nfeatures or a distance matrix

func custom function that accepts X and a parameter that return a vector of clusterings.
cluster_func <- function(X, param)
rep number of repeats
param vector of parameters
method method for the choice of k at each repeat c("random", "silhouette")

Details

At each repeat, k is selected randomly or based on the best silhouette width from a discrete uniform
distribution between range.k[1] and range.k[2]. Then clustering is applied and result is returned.

Value

matrix of clusterings Nsample x Nrepeat

Examples

X = gaussian_clusters()$X
cluster_func = function(X, k){return(stats::kmeans(X, k)$cluster)}
Clusters = multi_cluster_gen(X, cluster_func, param = c(2,3))
multi_kmeans_gen 25

multi_kmeans_gen Multiple K-means generation

Description

Multiple K-means generation

Usage

multi_kmeans_gen(X, rep = 10, range.k = c(2, 5), method = "random")

Arguments

X input data Nsample x Nfeatures

rep number of repeats
range.k vector of minimum and maximum values for k c(min, max)
method method for the choice of k at each repeat c("random", "silhouette")

Details

Value

matrix of clusterings Nsample x Nrepeat

Examples
X = gaussian_clusters()$X
Clusters = multi_kmeans_gen(X)

multi_pam_gen Multiple PAM (K-medoids) generation

Description

Multiple PAM (K-medoids) generation

26 pam_clust_from_adj_mat

Usage
multi_pam_gen(
X,
rep = 10,
range.k = c(2, 5),
is.distance = FALSE,
method = "random"
)

Arguments
X input data Nsample x Nfeatures or distance matrix.
rep number of repeats
range.k vector of minimum and maximum values for k c(min, max)
is.distance binary balue indicating if the input X is distance
method method for the choice of k at each repeat c("random", "silhouette")

Value
matrix of clusterings Nsample x Nrepeat

Examples
X = gaussian_clusters()$X
Clusters = multi_pam_gen(X)

pam_clust_from_adj_mat
PAM (k-medoids) clustering from adjacency matrix

Description
PAM (k-medoids) clustering from adjacency matrix

Usage
pam_clust_from_adj_mat(adj.mat, k = 2, alpha = 1, adj.conv = TRUE)
spect_clust_from_adj_mat 27

Arguments

adj.mat adjacency matrix

k number of clusters (default=2)
alpha soft threshold (considered if adj.conv = TRUE) (default=1)
adj.conv binary value to apply soft thresholding (default=TRUE)

Details

apply PAM (k-medoids) clustering on the adjacency matrix

Value

vector of clusters

Examples

Adj_mat = rbind(c(0.0,0.9,0.0),
c(0.9,0.0,0.2),
c(0.0,0.2,0.0))
pam_clust_from_adj_mat(Adj_mat)

spect_clust_from_adj_mat
Spectral clustering from adjacency matrix

Description

Spectral clustering from adjacency matrix

Usage

spect_clust_from_adj_mat(
adj.mat,
k = 2,
max.eig = 10,
alpha = 1,
adj.conv = TRUE,
do.plot = FALSE
)
28 spect_clust_from_adj_mat

Arguments
adj.mat adjacency matrix
k number of clusters (default=2)
max.eig maximum number of eigenvectors in use (dafaut = 10).
alpha soft threshold (considered if adj.conv = TRUE) (default = 1)
adj.conv binary value to apply soft thresholding (default = TRUE)
do.plot binary value to do plot (dafaut = FALSE)

Details
apply PAM (k-medoids) clustering on the adjacency matrix

Value
vector of clusters

Examples
Adj_mat = rbind(c(0.0,0.9,0.0),
c(0.9,0.0,0.2),
c(0.0,0.2,0.0))
hir_clust_from_adj_mat(Adj_mat)
Index

adj_conv, 2
adj_mat, 3

cc_cluster_count, 3
cluster_relabel, 4
coCluster_matrix, 5
connectivity_matrix, 6
consensus_matrix, 6
consensus_matrix_data_prtrb, 8
consensus_matrix_multiview, 9

gaussian_clusters, 10
gaussian_clusters_with_param, 11
gaussian_mixture_clusters, 11
generate_data_prtrb, 12
generate_gaussian_data, 14
generate_method_prtrb, 14
generate_multiview, 15

hir_clust_from_adj_mat, 16

indicator_matrix, 17

label_similarity, 18
Logit, 19

majority_voting, 19
multi_cluster_gen, 24
multi_kmeans_gen, 25
multi_pam_gen, 25
multiview_cluster_gen, 21
multiview_clusters, 20
multiview_kmeans_gen, 22
multiview_pam_gen, 23

pam_clust_from_adj_mat, 26

spect_clust_from_adj_mat, 27

ConsensusClusterPlus R Tutorial
No ratings yet
ConsensusClusterPlus R Tutorial
12 pages
Consensus Cluster Plus
No ratings yet
Consensus Cluster Plus
12 pages
Consensus Cluster Plus
No ratings yet
Consensus Cluster Plus
12 pages
The Sna Package: Topics Documented
No ratings yet
The Sna Package: Topics Documented
187 pages
Clustering 2
No ratings yet
Clustering 2
11 pages
Igraph
No ratings yet
Igraph
475 pages
Package E1071': September 16, 2024
No ratings yet
Package E1071': September 16, 2024
67 pages
Cluster 3.0 Manual: Michael Eisen Updated by Michiel de Hoon
No ratings yet
Cluster 3.0 Manual: Michael Eisen Updated by Michiel de Hoon
32 pages
Package E1071': R Topics Documented
No ratings yet
Package E1071': R Topics Documented
67 pages
R E1071 Package Definition
No ratings yet
R E1071 Package Definition
63 pages
Cluster 3.0 Manual: Michael Eisen Updated by Michiel de Hoon
No ratings yet
Cluster 3.0 Manual: Michael Eisen Updated by Michiel de Hoon
34 pages
Dbscan R PKG Description
No ratings yet
Dbscan R PKG Description
55 pages
Cluster Analysis Using Dicer: Install - Packages
No ratings yet
Cluster Analysis Using Dicer: Install - Packages
8 pages
Facto Extra
No ratings yet
Facto Extra
74 pages
SVM in R
No ratings yet
SVM in R
62 pages
Network Analysis with igraph
No ratings yet
Network Analysis with igraph
426 pages
Braingraph UserGuide
No ratings yet
Braingraph UserGuide
181 pages
Network Analysis in R
No ratings yet
Network Analysis in R
431 pages
R Package for Statistical Analysis
No ratings yet
R Package for Statistical Analysis
63 pages
Package ClusterR
No ratings yet
Package ClusterR
37 pages
Scikit Learn Docs PDF
No ratings yet
Scikit Learn Docs PDF
2,387 pages
Boom
No ratings yet
Boom
63 pages
Caret
No ratings yet
Caret
213 pages
Igraph Tutorial
No ratings yet
Igraph Tutorial
64 pages
NetSciX 2016 Workshop
No ratings yet
NetSciX 2016 Workshop
64 pages
Statistical Analysis & Display Package
No ratings yet
Statistical Analysis & Display Package
124 pages
Package NetworkToolbox'
No ratings yet
Package NetworkToolbox'
75 pages
Scikit Learn Docs
100% (1)
Scikit Learn Docs
2,201 pages
Species Distribution Modeling Tool
No ratings yet
Species Distribution Modeling Tool
63 pages
Package HMC': Title
No ratings yet
Package HMC': Title
25 pages
R Companion Data Mining
No ratings yet
R Companion Data Mining
370 pages
R Package for Model Training
No ratings yet
R Package for Model Training
223 pages
Scikit Learn Docs PDF
100% (3)
Scikit Learn Docs PDF
2,204 pages
Con Text
No ratings yet
Con Text
52 pages
Classification & Regression Training
No ratings yet
Classification & Regression Training
215 pages
Metan
No ratings yet
Metan
278 pages
mclust 4: R Package for Clustering
No ratings yet
mclust 4: R Package for Clustering
57 pages
Manual Biblioteca SF Do R
No ratings yet
Manual Biblioteca SF Do R
68 pages
Orange 3
100% (1)
Orange 3
46 pages
Caret
No ratings yet
Caret
222 pages
MCGLM Manual
No ratings yet
MCGLM Manual
49 pages
Species Distribution Modeling Package
No ratings yet
Species Distribution Modeling Package
60 pages
Tern
No ratings yet
Tern
290 pages
Scikit Learn Docs
100% (1)
Scikit Learn Docs
1,810 pages
HH PDF
No ratings yet
HH PDF
276 pages
Caret
0% (1)
Caret
206 pages
Clustering Techniques Explained
No ratings yet
Clustering Techniques Explained
61 pages
Scikit Learn Docs
No ratings yet
Scikit Learn Docs
2,503 pages
R & Network Analysis Workshop
No ratings yet
R & Network Analysis Workshop
51 pages
Ggbio
No ratings yet
Ggbio
266 pages
Package Caret': R Topics Documented
No ratings yet
Package Caret': R Topics Documented
136 pages
Package MLR': June 12, 2024
No ratings yet
Package MLR': June 12, 2024
305 pages
Orange3 Data Mining Library Using Python
50% (2)
Orange3 Data Mining Library Using Python
102 pages
Computer Vision Clustering Guide
No ratings yet
Computer Vision Clustering Guide
41 pages
Network
No ratings yet
Network
91 pages
NCC - Reference Letter
No ratings yet
NCC - Reference Letter
2 pages
Organizational Readiness and Its Contributing Factors To Adopt KM Processes: A Conceptual Model
No ratings yet
Organizational Readiness and Its Contributing Factors To Adopt KM Processes: A Conceptual Model
9 pages
Artificial Intelligence and Machine Learning in Healthcare
No ratings yet
Artificial Intelligence and Machine Learning in Healthcare
4 pages
Wireguard
No ratings yet
Wireguard
38 pages
Auto GMS: An Automated Greenhouse Monitoring System of Abiotic Factors For Leafy Vegetables Production
No ratings yet
Auto GMS: An Automated Greenhouse Monitoring System of Abiotic Factors For Leafy Vegetables Production
26 pages
Fuzz Face Circuit Analysis Guide
No ratings yet
Fuzz Face Circuit Analysis Guide
5 pages
Exchange API Best Practices March 2022
No ratings yet
Exchange API Best Practices March 2022
3 pages
Ijirt156711 Paper
No ratings yet
Ijirt156711 Paper
3 pages
Footwear Outsoles Needle Tear Test
No ratings yet
Footwear Outsoles Needle Tear Test
10 pages
Prisma VENT30 Prisma VENT30-C Prisma VENT40 Prisma VENT50 Prisma VENT50-C
No ratings yet
Prisma VENT30 Prisma VENT30-C Prisma VENT40 Prisma VENT50 Prisma VENT50-C
64 pages
Neuro-Symbolic AI in 2024 A Systematic Review
100% (1)
Neuro-Symbolic AI in 2024 A Systematic Review
19 pages
Operation Manuel ClipX BM40
No ratings yet
Operation Manuel ClipX BM40
498 pages
Sample Report - Bug Bounty Program
No ratings yet
Sample Report - Bug Bounty Program
41 pages
Darshan Institute of Engineering & Technology For Diploma Studies
No ratings yet
Darshan Institute of Engineering & Technology For Diploma Studies
1 page
Pushdown Automata: Introduction To Formal Languages and Automata
No ratings yet
Pushdown Automata: Introduction To Formal Languages and Automata
102 pages
16 EasyIO FG - FS Series Email Service v2.2
No ratings yet
16 EasyIO FG - FS Series Email Service v2.2
14 pages
Investigating Mechanical Properties of Animal Bone Powder Partially Replaced Cement in Concrete Production
No ratings yet
Investigating Mechanical Properties of Animal Bone Powder Partially Replaced Cement in Concrete Production
14 pages
Math Contest Problems 2012
No ratings yet
Math Contest Problems 2012
2 pages
VUS Software User's Manual
No ratings yet
VUS Software User's Manual
25 pages
1761-Article Text-4342-1-10-20230217
No ratings yet
1761-Article Text-4342-1-10-20230217
7 pages
Computer Graphics Interaction Guide
No ratings yet
Computer Graphics Interaction Guide
15 pages
Mini Series
100% (1)
Mini Series
66 pages
Procedure and Work Instrunction Manual (PAWIM)
No ratings yet
Procedure and Work Instrunction Manual (PAWIM)
219 pages
GUI Lab Programs 1 To 4
No ratings yet
GUI Lab Programs 1 To 4
33 pages
Volume 1
No ratings yet
Volume 1
343 pages
Reimagining The Teacher and The Learner in The Time of COVID-19
No ratings yet
Reimagining The Teacher and The Learner in The Time of COVID-19
45 pages
Xssss
No ratings yet
Xssss
23 pages
Plasma Arc Welding: Unit-Iii
No ratings yet
Plasma Arc Welding: Unit-Iii
32 pages
Class XII Holiday Homework Guide
No ratings yet
Class XII Holiday Homework Guide
37 pages
Basic Computer Keyboard
No ratings yet
Basic Computer Keyboard
4 pages

Consensus Clustering

Uploaded by

Consensus Clustering

Uploaded by

Package ‘ConsensusClustering’

July 30, 2024

adj_conv Convert adjacency function to the affinity matrix

adj_mat Covert data matrix to adjacency matrix

cc_cluster_count Count the number of clusters based on stability score.

cluster_relabel Relabeling clusters based on cluster similarities

connectivity_matrix Build connectivity matrix

Build connectivity matrix

consensus_matrix Calculate consensus matrix for data perturbation consensus cluster-

Calculate consensus matrix for data perturbation consensus clustering

X adjacency matrix a Nsample x Nsample

list of consensus matrices for each k

a list of data points (X) and cluster labels (class)

generate_data_prtrb Generation mechanism for data perturbation consensus clustering

Generation mechanism for data perturbation consensus clustering

X input data Nsample x Nfeatures

matrix of clusterings Nsample x Nrepeat

generate_method_prtrb Multiple method generation

generate_multiview Multiview generation

matrix of clusterings Nsample x Nrepeat

Hierarchical clustering from adjacency matrix

indicator_matrix Build indicator matrix

label_similarity Similarity between different clusters

Logit Logit function

majority_voting Consensus mechanism based on majority voting

multiview_clusters Generate multiview clusters from Gaussian distributions with ran-

multiview_cluster_gen Multiview cluster generation

multiview_kmeans_gen Multiview K-means generation

Multiview K-means generation

multiview_kmeans_gen(X, rep = 10, range.k = c(2, 5), method = "random")

X List of input data matrices of Sample x feature. The length of X is equal to

matrix of clusterings Nsample x (Nrepeat x Nviews)

data = multiview_clusters (n = c(40,40,40), hidden.dim = 2, observed.dim = c(2,2,2),

multiview_pam_gen Multiview PAM (K-medoids) generation

multi_cluster_gen Multiple cluster generation

Multiple cluster generation

multi_cluster_gen(X, func, rep = 10, param, method = "random")

X input data Nsample x Nfeatures or a distance matrix

matrix of clusterings Nsample x Nrepeat

multi_kmeans_gen Multiple K-means generation

Multiple K-means generation

multi_kmeans_gen(X, rep = 10, range.k = c(2, 5), method = "random")

X input data Nsample x Nfeatures

matrix of clusterings Nsample x Nrepeat

multi_pam_gen Multiple PAM (K-medoids) generation

Multiple PAM (K-medoids) generation

adj.mat adjacency matrix

apply PAM (k-medoids) clustering on the adjacency matrix

Spectral clustering from adjacency matrix

You might also like