Chemical Space Analysis
Updated 10/12/2025
Visualize and analyze molecular datasets using dimensionality reduction and clusterization techniques. Import molecules, explore chemical space, identify clusters, and discover patterns in your molecular data.
Table of Contents (Estimated reading time: 10-12 minutes)
- Dimensionality Reduction
- Importing your molecules
- Selecting features
- Choosing dimensionality reduction model
- Choosing model hyperparameters
- Acquiring the projection
- Navigating the projection
- Customizing the projection
- Highlighting specific molecules
- Clusterization
- Select clustering algorithm
- The clustered projection
- Silhouette scores
- Silhouette plot
- Molecules in each cluster
- Cluster Medoids
- Medoid structures
- Navigation and customization
Dimensionality Reduction
Chemical space analysis begins with dimensionality reduction to visualize high-dimensional molecular data in 2D or 3D space, making it easier to identify patterns, clusters, and relationships between molecules.
Importing your molecules
Import molecules as SMILES by hovering on Import SMILES and choosing your preferred upload method. If you want to load a dataset that you have uploaded previously, press Load File instead.
Selecting features
Choose if you want Chemetrian to compute features for you (fingerprints, RDKit descriptors, etc), or choose "Use features from file" if your csv already contained features associated with your SMILES strings.
Choosing dimensionality reduction model
Decide on a dimensionality reduction model and distance metric. Hovering over section headers as well as choices will give additional information on them.
Choosing model hyperparameters
Select hyperparameter values if the model requires them.
Acquiring the projection
Press the Run Projection button. This can be as fast as a few seconds but will scale with the number of molecules uploaded.
Navigating the projection
Hovering will show molecular structures, and scrolling will zoom in or out of the map for more granular analysis of chemical space. Clicking a molecule's datapoint will keep the molecule selected. Click it again to deselect it. The molecule's SMILES string can be copied to the clipboard by pressing the copy button next to the SMILES string.
Customizing the projection
Clicking the gear symbol in the top right corner
Highlighting specific molecules
To highlight specific molecules on the structure, SMILES strings or lists of SMILES strings can be copied into the "search SMILES" box. Pressing enter will highlight them. Pressing over "Display searched SMILES" will give a list of SMILES that are highlighted. They can be unhighlighted here. Molecule highlight color can be changed by pressing the colored bar and selecting other colors.
Clusterization
Change the mode from "Projection" to "Clusterization".
Select clustering algorithm
Select a clustering algorithm and number of clusters if the algorithm requires it. Press "Run Clusterization" located underneath the projection.
The clustered projection
The clustered projection will appear. Each color represents a different cluster.
Silhouette scores
Silhouette score is a commonly used metric to evaluate cluster quality. Press "View Silhouette Score" to see a plot of scores across different numbers of clusters.
Silhouette plot
A plot like the one below will appear.
Molecules in each cluster
To obtain a list of which molecules belong to which clusters, press "Download Clusters".
Cluster Medoids
The medoid molecule of a cluster is the one most close to the middle. This can be interpreted as the medoid molecule being representative of a specific cluster within the dataset. Press "Download Medoids" for a list of medoid SMILES strings. Press "View Structures" to see the chemical structures of the Medoids.
Medoid structures
View the chemical structures of the representative medoid molecules for each cluster.
Navigation and customization
All the same navigation, customization, and highlighting tools as the dimensionality reduction are available for the clusterization plot.