molprop-calc-v5
Extended calculator that includes everything in v4 plus solubility, permeability, PK heuristics, lead metrics,
and optional conformer-based 3D shape descriptors (3D_*). It also emits developability indices (Dev_*).
Typical usage
Output format is inferred from the filename extension: .csv, .tsv, or .parquet.
# CSV
molprop-calc-v5 input.smi -o results.csv
# Parquet
molprop-calc-v5 input.smi -o results.parquet
# equivalent (script form)
python calculators/mpo_v5.py input.smi -o results.csv
Optional 3D mode
Adds 3D_* columns using RDKit ETKDG conformers (treat as qualitative ranking features).
molprop-calc-v5 input.smi -o results.csv --3d --3d-num-confs 10 --3d-minimize mmff
Ligand preparation (recommended sequences)
See also: Ligand preparation guide
The calculators can work directly from the input SMILES, but most downstream comparisons only make sense if you
standardize structures consistently. In MolProp Toolkit, “preparation” is a chain of steps that produces traceable
SMILES columns: Input_Canonical_SMILES (canonicalized input), Canonical_SMILES (standardized parent),
Calc_Base_SMILES (post tautomer/stereo selection, pre-protomer), and Calc_Canonical_SMILES (the structure
actually used for descriptor calculation). Choose one sequence and keep it stable across a project.
This is the default “medchem triage” posture: normalize salts and common charge forms, canonicalize a single tautomer, keep stereochemistry as provided, and add pH-aware ionization features without enumerating explicit protonation states. This yields stable 2D descriptors and keeps the table interpretable.
Internally, this follows RDKit MolStandardize Cleanup → FragmentParent → Uncharger → Reionizer → Canonicalize tautomer,
then assigns stereochemistry for auditing. The structure used for calculation is recorded in Calc_Canonical_SMILES.
molprop-calc-v5 input.smi -o results.csv \
--ph 7.4 \
--ionization heuristic \
--tautomer-mode prep-canonical \
--stereo-mode keep
Use this when you want explicit visibility into ambiguity. The calculator enumerates plausible tautomers and
unresolved stereocenters (bounded by max limits), selects a representative, and records the enumerated sets in
Tautomer_* and Stereo_* columns. This is useful when you are cleaning vendor data, merging
sources, or preparing a modeling-ready table where you need to understand what was “decided” by the pipeline.
molprop-calc-v5 input.smi -o results.csv \
--tautomer-mode enumerate --tautomer-max 64 --tautomer-topk 5 \
--stereo-mode enumerate --stereo-max 32 --stereo-topk 5 --stereo-select canonical
This still does not change the structure to a pH-selected protomer unless you enable protomer enumeration. It focuses on making tautomer/stereo assumptions explicit.
If you will compute conformer-based 3D descriptors (--3d) or use 3D fingerprints for clustering
(--fp usr/--fp usrcat in molprop-series), you should decide stereochemistry and protonation
state up front. 3D geometry depends on both; leaving them ambiguous can make “the same compound” behave like a
different molecule from one run or dataset to the next.
A practical reproducible option is to enumerate protomers at a defined pH window, select one protomer, compute
descriptors on that protomer, and then generate 3D conformers with an explicit seed. The chosen structure is
recorded in Calc_Canonical_SMILES, and protomer choice is recorded in Protomer_* columns.
# optional dependency
pip install dimorphite_dl
molprop-calc-v5 input.smi -o results.csv \
--ph 7.4 --ionization dimorphite --calc-on-protomer \
--protomer-select closest-charge \
--tautomer-mode prep-canonical \
--stereo-mode keep \
--3d --3d-num-confs 10 --3d-minimize mmff --3d-seed 0
If stereo is frequently missing in your inputs and 3D features matter, pair this with --stereo-mode enumerate.
This mode is only recommended if you have already standardized structures elsewhere and you want the calculator to treat the input as authoritative. It disables the RDKit MolStandardize preparation step. You can still compute properties, but salt forms, charge variants, and inconsistent tautomer forms will propagate into the table.
molprop-calc-v5 input.smi -o results.csv --no-prep --tautomer-mode none --ionization none
If you later run molprop-series, clustering will reflect whatever structural diversity is present in the
raw inputs, including salts and alternate tautomers.
Common options
Run molprop-calc-v5 --help for the complete list. Common knobs:
# pH / ionization
molprop-calc-v5 input.smi -o results.csv --ph 7.4 --ionization heuristic
# protomer enumeration (optional dependency)
pip install dimorphite_dl
molprop-calc-v5 input.smi -o results.csv --ionization dimorphite --ph 7.4
# compute descriptors on the selected protomer
molprop-calc-v5 input.smi -o results.csv --ionization dimorphite --ph 7.4 --calc-on-protomer
# stereo and tautomer handling
molprop-calc-v5 input.smi -o results.csv --stereo-mode enumerate --stereo-max 32
molprop-calc-v5 input.smi -o results.csv --tautomer-mode enumerate
Downstream usage
Category summaries
molprop-analyze results.csv --list
molprop-analyze results.csv --category developability
molprop-analyze results.csv --category cns_mpo
Report bundle
molprop-report results.csv