Descriptor Calculation Workflow for Molecular Libraries
Updated 04/20/2026
This guide describes how to upload molecular libraries, configure and launch a DFT-based descriptor pipeline, and review and download results — including conformers and per-atom/bond features.
Table of Contents (Estimated reading time: 10-12 minutes)
- Upload a Molecular Library
- Load the Desired Library
- Explore Conformers and Descriptor Outputs
- Create a New DFT Pipeline
- Configure Conformer Search and Geometry Optimization
- Select DFT Level of Theory and Review Pipeline Summary
- Configure Solvent, Features, and Advanced Settings
- Review and Create the Pipeline
- Monitor and Analyze Pipeline Runs
Step 1: Upload a Molecular Library
Here you can upload libraries of your molecules in several formats, including SMILES, CDXML, and 3D files such as MOL2 and XYZ. In both CDXML and XYZ/MOL2 formats, you can perform bond and atom labeling, which will later be used to generate DFT-level features.
Step 2: Load the Desired Library
Locate the target library — for example, "Denmark breakthrough catalysts," containing forty-three molecules — and load it into the workspace. Once loaded, the molecules will appear in the 3D viewer, where you can inspect their structures.
Step 3: Explore Conformers and Descriptor Outputs
If you have previously run any DFT pipelines on this library, you can view the conformers discovered during conformer search and generation.
You will see a list of conformers ranked by energy, starting from the lowest-energy conformer. You can cycle through these conformational states and visually inspect them based on the DFT run results.
At any time, you can download the optimized conformer, as well as the original XYZ or CDXML files.
In the same view, you can access RDKit features (with descriptions), physics-based descriptor runs such as R2scan, and all associated bond- and atom-level features. You can also download all results across your entire library for a given run and view more detailed information about that run.
Step 4: Create a New DFT Pipeline
To configure a new DFT pipeline, start by creating a new pipeline.
Select an existing library, such as the "Denmark breakthrough catalysts," from the list or by using the search bar.
After selecting the library, click Next to proceed.
Choose the level of theory family. Semi-empirical options like GFN2-xTB and GFN-xTB are available, but in this example you will select density functional theory (DFT).
Step 5: Configure Conformer Search and Geometry Optimization
Set up the conformer search method. The system will auto-select the best method based on detected molecule types. For example, if all forty-three molecules are heterocycles, the platform may recommend CREST GFNff as the default. If desired, you can override this choice per molecule file and instead use methods like NVMolKit or GFN2-xTB.
Scroll down to configure geometry optimization. By default, the Meta-UMA neural network potential is used to optimize geometries after conformers are found. The workflow typically finds multiple conformers, selects a set of low-energy conformers for optimization with UMA, and then chooses the lowest-energy conformer for DFT descriptor calculation.
Click Next to move on to DFT theory selection.
Step 6: Select DFT Level of Theory and Review Pipeline Summary
Choose the DFT functional and basis set. Available functionals include R2scan, PBE, PBE0, and others, along with multiple basis set options. The interface provides four recommended "levels of theory" bundles (e.g., Default, Fastest, Balanced, Highest Accuracy), each tuned with appropriate settings. For machine learning or regression workflows, Default, Fastest, or Balanced are typically recommended.
On the same screen, you can review the pipeline summary, which consolidates your key selections and settings.
Click Next to continue.
Step 7: Configure Solvent, Features, and Advanced Settings
Select the solvent for your DFT configuration. You can choose from various solvents; for this example, select gas phase.
Configure descriptor outputs by selecting or deselecting bond-level and atom-level features.
You can also inspect CDXML-derived information, including bond labels, atom labels, and dihedral configurations.
Set global defaults for charge and multiplicity, and override them for individual molecules where necessary.
Optionally configure advanced settings, such as the number of DFT molecules processed per machine, and the CREST energy window used during conformational search.
Step 8: Review and Create the Pipeline
Open the pipeline review screen to verify all details before launching. Here you can confirm the selected library, estimated runtime, chosen settings, and the full list of descriptors that will be generated. Once satisfied, create the pipeline to start processing.
You will be redirected to the pipelines page, where you can see all running, finished, canceled, or failed pipelines, and access machine logs for troubleshooting.
Step 9: Monitor and Analyze Pipeline Runs
From the pipelines page, locate and open a past run, such as the "Denmark breakthrough catalysts" pipeline executed a day ago.
You can see how many molecules were processed and how many machines were used (for example, five H100 GPU clusters).
The interface shows which molecules each machine handled. Scroll through the pipeline logs to inspect each stage — optimization, conformer search, and DFT calculations.
If any molecules or pipeline stages fail, you can quickly identify the problem, adjust your settings, and launch a new run after resolving the issues. Additional run metadata is available, including total duration, compute time, job status, and download links for artifacts such as conformers.
You can cancel a running pipeline at any time or view the exact settings under which it was executed.