[Go to site: main page, start]

Most basic dendrogram for clustering with R



Clustering allows to group samples by similarity and can its result can be visualized as a dendrogram. This post describes a basic usage of the hclust() function and builds a dendrogram from its output.

Dendrogram section Data to Viz

Most basic dendrogram with R


→ Input dataset is a matrix where each row is a sample, and each column is a variable. Keep in mind you can transpose a matrix using the t() function if needed.

→ Clustering is performed on a square matrix (sample x sample) that provides the distance between samples. It can be computed using the dist() or the cor() function depending on the question your asking

→ The hclust() function is used to perform the hierarchical clustering

→ Its output can be visualized directly with the plot() function. See possible customization.



Hierarchical clustering principle:


There are several ways to calculate the distance between 2 clusters ( using the max between 2 points of the clusters, or the mean, or the min, or ward (default) ).

Zoom on a group


It is possible to zoom on a specific part of the tree. Select the group of interest using the [[..]] operator:



Related chart types


Grouped and Stacked barplot
Treemap
Doughnut
Pie chart
Dendrogram
Circular packing



❤️ 10 best R tricks ❤️

👋 After crafting hundreds of R charts over 12 years, I've distilled my top 10 tips and tricks. Receive them via email! One insight per day for the next 10 days! 🔥