Physics-based Descriptor Calculation

Updated 10/12/2025

Calculate physics-based molecular descriptors using computational chemistry methods. Upload SMILES for global properties or CDXML files for atom/bond-level analysis including atomic charges, steric parameters, and dihedral angles.

Table of Contents (Estimated reading time: 12-15 minutes)

Descriptor calculation overview
Calculating global molecular properties via SMILES upload
Building a molecule library
Library naming and description
Upload SMILES
Explore library
Calculating physics-based descriptors
View descriptors
Calculating atom/bond-level and global molecular properties via CDXML upload
Preparing the CDXML file
Upload the file
CDXML calculations
CDXML results
Feature formats

Descriptor calculation overview

There are main pipelines here, one for calculation of global molecular properties which can be done via SMILES upload, and another for calculation of global + atom/bond-level properties (atomic charges, steric parameters) which requires .cdxml file upload. This document first covers global calculations and second covers atom/bond-level calculations via .cdxml upload.

Calculating global molecular properties via SMILES upload

If you only want global molecular properties, you can upload molecules in SMILES format.

Building a molecule library

Start by pressing "new library". You will be prompted to proceed via smiles or ChemDraw file upload. We will start with SMILES upload.

Library naming and description

You will be prompted to name the library and provide a description.

Upload SMILES

Next, you can either upload a csv containing smiles OR simply copy/paste them in. To copy paste them, simply copy the whole row of SMILES that you have from a CSV, or a list of SMILES separated by commas, and paste them in. Press upload SMILES. The SMILES are automatically saved as CSV for your future reference, and their formats will be validated. We uploaded 286 SMILES strings.

Explore library

You can now see that your library has been created in the left panel. Click the library to load it.

Press the "molecules" tab to be able to explore independent molecules in the library. Structures will automatically be converted to 3D via optimization with MMFF and RDkit 2D descriptors will be automatically computed and displayed in the panel on the right.

Calculating physics-based descriptors

To compute 3D, physics-based descriptors press "calculate descriptors" or "pipeline view". Here you can select which molecules from the library you want to submit to calculations. Pressing the topmost check box will select all of them.

Then, choose computational chemistry parameters. In the beta version, these parameters are quite limited, but conformational analysis and higher levels of theory will be available soon. Press "calculate" under the "calculate features" header to run the pipeline. This will take varying amounts of time based on number and size of molecules, type of molecules, and level of theory selected. When the run is completed, it will say "complete" in the job table.

View descriptors

Press "view descriptors" or "library view" to reopen the molecule library. Now, if you click the "3D" tab in the rightmost box, you can view the descriptors that were calculated via the computational chemistry pipeline for each molecule. Pressing the download icon will provide an organized CSV with descriptors (physics-based and/or RDKit2D) that can be used for predictive modeling (after adding data labels ie yields, selecivity, etc), chemical space analysis, or other tasks.

Calculating atom/bond-level and global molecular properties via CDXML upload

This process will give the same whole molecule properties as above but will also provide atomic charges, sterimol B1, B5, and L parameters for defined bonds, and dihedral angles if the molecules in the ChemDraw are annotated.

Preparing the CDXML file

First, you will need to prepare a file in ChemDraw. If you want atom/bond-level properties, you must label the atoms of the core conserved structure of your molecules. For example, I have a group of about 29 BOX ligands I want to featurize. You can see how I labeled the core structures:

To add atom labels, we recommend giving them "letter" assignments. You can do this by hovering over an atom and pressing the apostrophe (') key. It will automatically label it as something, but you can edit the text box to be a letter like in the example. We also recommend putting a text box below each chemical structure with a molecule number, ID, or name. This will be saved with the molecule when the ChemDraw is parsed. See the sample CDXML file:

Upload the file

After pressing "new library", choose CDXML upload.

Now, you will be asked to name the library, provide a description, and show the contents of the file. This includes the SMILES of the molecules, as well as the conserved atom labels that you provided. The bonds and dihedrals will also be parsed.

CDXML calculations

Now, navigate to the new library that you created. 3D structures and 2D descriptors will be automatically computed like the SMILES upload workflow. The difference comes when you go to calculate descriptors. After selecting the molecules you want descriptors for, an option to turn on atom and bond level features will appear in the pipeline. If you want these descriptors, turn the toggle on like in the example below. Then, run the calculation.

CDXML results

Now, you will see a descriptor library on the right side under the "3D" tab that contains the atom and bond level features along with whole molecule features. These features can all be downloaded as a CSV in an organized, ML-ready format by pressing the download button.