Join Chemetrian now and get started for free!

Descriptor Calculation Workflow for Molecular Libraries

Updated 04/20/2026

This guide describes how to upload molecular libraries, configure and launch a DFT-based descriptor pipeline, and review and download results — including conformers and per-atom/bond features.

Table of Contents (Estimated reading time: 10-12 minutes)

  1. Upload a Molecular Library
  2. Load the Desired Library
  3. Explore Conformers and Descriptor Outputs
  4. Create a New DFT Pipeline
  5. Configure Conformer Search and Geometry Optimization
  6. Select DFT Level of Theory and Review Pipeline Summary
  7. Configure Solvent, Features, and Advanced Settings
  8. Review and Create the Pipeline
  9. Monitor and Analyze Pipeline Runs

Step 1: Upload a Molecular Library

Here you can upload libraries of your molecules in several formats, including SMILES, CDXML, and 3D files such as MOL2 and XYZ. In both CDXML and XYZ/MOL2 formats, you can perform bond and atom labeling, which will later be used to generate DFT-level features.

Upload format options
Molecular library upload interface
Library upload confirmation

Step 2: Load the Desired Library

Locate the target library — for example, "Denmark breakthrough catalysts," containing forty-three molecules — and load it into the workspace. Once loaded, the molecules will appear in the 3D viewer, where you can inspect their structures.

Library loaded in 3D viewer

Step 3: Explore Conformers and Descriptor Outputs

If you have previously run any DFT pipelines on this library, you can view the conformers discovered during conformer search and generation.

Conformer list panel

You will see a list of conformers ranked by energy, starting from the lowest-energy conformer. You can cycle through these conformational states and visually inspect them based on the DFT run results.

Conformers ranked by energy

At any time, you can download the optimized conformer, as well as the original XYZ or CDXML files.

In the same view, you can access RDKit features (with descriptions), physics-based descriptor runs such as R2scan, and all associated bond- and atom-level features. You can also download all results across your entire library for a given run and view more detailed information about that run.

RDKit and physics-based descriptor outputs

Step 4: Create a New DFT Pipeline

To configure a new DFT pipeline, start by creating a new pipeline.

Create new pipeline button

Select an existing library, such as the "Denmark breakthrough catalysts," from the list or by using the search bar.

Library selection screen

After selecting the library, click Next to proceed.

Library selected, proceed to next step

Choose the level of theory family. Semi-empirical options like GFN2-xTB and GFN-xTB are available, but in this example you will select density functional theory (DFT).

Level of theory selection

Step 5: Configure Conformer Search and Geometry Optimization

Set up the conformer search method. The system will auto-select the best method based on detected molecule types. For example, if all forty-three molecules are heterocycles, the platform may recommend CREST GFNff as the default. If desired, you can override this choice per molecule file and instead use methods like NVMolKit or GFN2-xTB.

Conformer search method configuration

Scroll down to configure geometry optimization. By default, the Meta-UMA neural network potential is used to optimize geometries after conformers are found. The workflow typically finds multiple conformers, selects a set of low-energy conformers for optimization with UMA, and then chooses the lowest-energy conformer for DFT descriptor calculation.

Geometry optimization settings

Click Next to move on to DFT theory selection.

Step 6: Select DFT Level of Theory and Review Pipeline Summary

Choose the DFT functional and basis set. Available functionals include R2scan, PBE, PBE0, and others, along with multiple basis set options. The interface provides four recommended "levels of theory" bundles (e.g., Default, Fastest, Balanced, Highest Accuracy), each tuned with appropriate settings. For machine learning or regression workflows, Default, Fastest, or Balanced are typically recommended.

On the same screen, you can review the pipeline summary, which consolidates your key selections and settings.

DFT functional and basis set selection with pipeline summary

Click Next to continue.

Step 7: Configure Solvent, Features, and Advanced Settings

Select the solvent for your DFT configuration. You can choose from various solvents; for this example, select gas phase.

Solvent selection

Configure descriptor outputs by selecting or deselecting bond-level and atom-level features.

Bond-level and atom-level feature selection

You can also inspect CDXML-derived information, including bond labels, atom labels, and dihedral configurations.

CDXML-derived bond and atom labels

Set global defaults for charge and multiplicity, and override them for individual molecules where necessary.

Charge and multiplicity settings

Optionally configure advanced settings, such as the number of DFT molecules processed per machine, and the CREST energy window used during conformational search.

Advanced pipeline settings

Step 8: Review and Create the Pipeline

Open the pipeline review screen to verify all details before launching. Here you can confirm the selected library, estimated runtime, chosen settings, and the full list of descriptors that will be generated. Once satisfied, create the pipeline to start processing.

Pipeline review and creation screen

You will be redirected to the pipelines page, where you can see all running, finished, canceled, or failed pipelines, and access machine logs for troubleshooting.

Step 9: Monitor and Analyze Pipeline Runs

From the pipelines page, locate and open a past run, such as the "Denmark breakthrough catalysts" pipeline executed a day ago.

Pipelines page with past runs

You can see how many molecules were processed and how many machines were used (for example, five H100 GPU clusters).

Pipeline machine and molecule summary

The interface shows which molecules each machine handled. Scroll through the pipeline logs to inspect each stage — optimization, conformer search, and DFT calculations.

If any molecules or pipeline stages fail, you can quickly identify the problem, adjust your settings, and launch a new run after resolving the issues. Additional run metadata is available, including total duration, compute time, job status, and download links for artifacts such as conformers.

You can cancel a running pipeline at any time or view the exact settings under which it was executed.

Pipeline logs and run details