molprop-calc-v5

Extended calculator that includes everything in v4 plus solubility, permeability, PK heuristics, lead metrics, and optional conformer-based 3D shape descriptors (3D_*). It also emits developability indices (Dev_*).

Typical usage

Output format is inferred from the filename extension: .csv, .tsv, or .parquet.

# CSV
molprop-calc-v5 input.smi -o results.csv

# Parquet
molprop-calc-v5 input.smi -o results.parquet

# equivalent (script form)
python calculators/mpo_v5.py input.smi -o results.csv

Optional 3D mode

Adds 3D_* columns using RDKit ETKDG conformers (treat as qualitative ranking features).

molprop-calc-v5 input.smi -o results.csv --3d --3d-num-confs 10 --3d-minimize mmff

Ligand preparation (recommended sequences)

See also: Ligand preparation guide

The calculators can work directly from the input SMILES, but most downstream comparisons only make sense if you standardize structures consistently. In MolProp Toolkit, “preparation” is a chain of steps that produces traceable SMILES columns: Input_Canonical_SMILES (canonicalized input), Canonical_SMILES (standardized parent), Calc_Base_SMILES (post tautomer/stereo selection, pre-protomer), and Calc_Canonical_SMILES (the structure actually used for descriptor calculation). Choose one sequence and keep it stable across a project.

This is the default “medchem triage” posture: normalize salts and common charge forms, canonicalize a single tautomer, keep stereochemistry as provided, and add pH-aware ionization features without enumerating explicit protonation states. This yields stable 2D descriptors and keeps the table interpretable.

Internally, this follows RDKit MolStandardize Cleanup → FragmentParent → Uncharger → Reionizer → Canonicalize tautomer, then assigns stereochemistry for auditing. The structure used for calculation is recorded in Calc_Canonical_SMILES.

molprop-calc-v5 input.smi -o results.csv \
  --ph 7.4 \
  --ionization heuristic \
  --tautomer-mode prep-canonical \
  --stereo-mode keep

Use this when you want explicit visibility into ambiguity. The calculator enumerates plausible tautomers and unresolved stereocenters (bounded by max limits), selects a representative, and records the enumerated sets in Tautomer_* and Stereo_* columns. This is useful when you are cleaning vendor data, merging sources, or preparing a modeling-ready table where you need to understand what was “decided” by the pipeline.

molprop-calc-v5 input.smi -o results.csv \
  --tautomer-mode enumerate --tautomer-max 64 --tautomer-topk 5 \
  --stereo-mode enumerate --stereo-max 32 --stereo-topk 5 --stereo-select canonical

This still does not change the structure to a pH-selected protomer unless you enable protomer enumeration. It focuses on making tautomer/stereo assumptions explicit.

If you will compute conformer-based 3D descriptors (--3d) or use 3D fingerprints for clustering (--fp usr/--fp usrcat in molprop-series), you should decide stereochemistry and protonation state up front. 3D geometry depends on both; leaving them ambiguous can make “the same compound” behave like a different molecule from one run or dataset to the next.

A practical reproducible option is to enumerate protomers at a defined pH window, select one protomer, compute descriptors on that protomer, and then generate 3D conformers with an explicit seed. The chosen structure is recorded in Calc_Canonical_SMILES, and protomer choice is recorded in Protomer_* columns.

# optional dependency
pip install dimorphite_dl

molprop-calc-v5 input.smi -o results.csv \
  --ph 7.4 --ionization dimorphite --calc-on-protomer \
  --protomer-select closest-charge \
  --tautomer-mode prep-canonical \
  --stereo-mode keep \
  --3d --3d-num-confs 10 --3d-minimize mmff --3d-seed 0

If stereo is frequently missing in your inputs and 3D features matter, pair this with --stereo-mode enumerate.

This mode is only recommended if you have already standardized structures elsewhere and you want the calculator to treat the input as authoritative. It disables the RDKit MolStandardize preparation step. You can still compute properties, but salt forms, charge variants, and inconsistent tautomer forms will propagate into the table.

molprop-calc-v5 input.smi -o results.csv --no-prep --tautomer-mode none --ionization none

If you later run molprop-series, clustering will reflect whatever structural diversity is present in the raw inputs, including salts and alternate tautomers.

Common options

Run molprop-calc-v5 --help for the complete list. Common knobs:

# pH / ionization
molprop-calc-v5 input.smi -o results.csv --ph 7.4 --ionization heuristic

# protomer enumeration (optional dependency)
pip install dimorphite_dl
molprop-calc-v5 input.smi -o results.csv --ionization dimorphite --ph 7.4

# compute descriptors on the selected protomer
molprop-calc-v5 input.smi -o results.csv --ionization dimorphite --ph 7.4 --calc-on-protomer

# stereo and tautomer handling
molprop-calc-v5 input.smi -o results.csv --stereo-mode enumerate --stereo-max 32
molprop-calc-v5 input.smi -o results.csv --tautomer-mode enumerate

Downstream usage

Category summaries

molprop-analyze results.csv --list
molprop-analyze results.csv --category developability
molprop-analyze results.csv --category cns_mpo

Report bundle

molprop-report results.csv