Join Chemetrian now and get started for free!

Chemical Space Analysis

Updated 10/12/2025

Visualize and analyze molecular datasets using dimensionality reduction and clusterization techniques. Import molecules, explore chemical space, identify clusters, and discover patterns in your molecular data.

Table of Contents (Estimated reading time: 10-12 minutes)

  1. Dimensionality Reduction
  2. Importing your molecules
  3. Selecting features
  4. Choosing dimensionality reduction model
  5. Choosing model hyperparameters
  6. Acquiring the projection
  7. Navigating the projection
  8. Customizing the projection
  9. Highlighting specific molecules
  10. Clusterization
  11. Select clustering algorithm
  12. The clustered projection
  13. Silhouette scores
  14. Silhouette plot
  15. Molecules in each cluster
  16. Cluster Medoids
  17. Medoid structures
  18. Navigation and customization

Dimensionality Reduction

Chemical space analysis begins with dimensionality reduction to visualize high-dimensional molecular data in 2D or 3D space, making it easier to identify patterns, clusters, and relationships between molecules.

Dimensionality Reduction Interface

Importing your molecules

Import molecules as SMILES by hovering on Import SMILES and choosing your preferred upload method. If you want to load a dataset that you have uploaded previously, press Load File instead.

Import SMILES Interface

Selecting features

Choose if you want Chemetrian to compute features for you (fingerprints, RDKit descriptors, etc), or choose "Use features from file" if your csv already contained features associated with your SMILES strings.

Feature Selection Interface

Choosing dimensionality reduction model

Decide on a dimensionality reduction model and distance metric. Hovering over section headers as well as choices will give additional information on them.

Model Selection Interface

Choosing model hyperparameters

Select hyperparameter values if the model requires them.

Hyperparameters Interface

Acquiring the projection

Press the Run Projection button. This can be as fast as a few seconds but will scale with the number of molecules uploaded.

Hovering will show molecular structures, and scrolling will zoom in or out of the map for more granular analysis of chemical space. Clicking a molecule's datapoint will keep the molecule selected. Click it again to deselect it. The molecule's SMILES string can be copied to the clipboard by pressing the copy button next to the SMILES string.

Run Projection Interface

Customizing the projection

Clicking the gear symbol in the top right corner

Projection Navigation Interface

Highlighting specific molecules

To highlight specific molecules on the structure, SMILES strings or lists of SMILES strings can be copied into the "search SMILES" box. Pressing enter will highlight them. Pressing over "Display searched SMILES" will give a list of SMILES that are highlighted. They can be unhighlighted here. Molecule highlight color can be changed by pressing the colored bar and selecting other colors.

Customization Interface

Clusterization

Change the mode from "Projection" to "Clusterization".

Molecule Highlighting Interface

Select clustering algorithm

Select a clustering algorithm and number of clusters if the algorithm requires it. Press "Run Clusterization" located underneath the projection.

Clusterization Mode Interface

The clustered projection

The clustered projection will appear. Each color represents a different cluster.

Clustering Algorithm Interface

Silhouette scores

Silhouette score is a commonly used metric to evaluate cluster quality. Press "View Silhouette Score" to see a plot of scores across different numbers of clusters.

Clustered Projection Interface

Silhouette plot

A plot like the one below will appear.

Silhouette Score Interface

Molecules in each cluster

To obtain a list of which molecules belong to which clusters, press "Download Clusters".

Silhouette Plot

Cluster Medoids

The medoid molecule of a cluster is the one most close to the middle. This can be interpreted as the medoid molecule being representative of a specific cluster within the dataset. Press "Download Medoids" for a list of medoid SMILES strings. Press "View Structures" to see the chemical structures of the Medoids.

Download Clusters Interface

Medoid structures

View the chemical structures of the representative medoid molecules for each cluster.

Cluster Medoids Interface

All the same navigation, customization, and highlighting tools as the dimensionality reduction are available for the clusterization plot.