molprop-similarity

Fingerprint-based similarity search against a SMILES file or a MolProp results table (CSV/Parquet), plus optional pairwise matrices, diversity picking, and simple similarity clustering.

Typical usage

# Search library for hits similar to a query
molprop-similarity "CCO" library.smi --top 25 -o hits.csv

# Use a MolProp results table (auto-detects Calc_Canonical_SMILES when present)
molprop-similarity "CCO" results.parquet --top 25 -o hits.csv

Common modes

# Pairwise similarity matrix (small sets)
molprop-similarity --pairwise library.smi -o matrix.csv

# Diversity picking (MaxMin)
molprop-similarity --diversity library.smi --pick 100 -o diverse.csv

# Leader-follower clustering
molprop-similarity --cluster library.smi --threshold 0.7 -o clusters.csv

Fingerprint & metric

# Default: Morgan + Tanimoto
molprop-similarity "c1ccccc1" library.csv --threshold 0.7 --top 20

# Alternatives
molprop-similarity "c1ccccc1" library.csv --fp maccs --metric dice --top 50

Tip

On MolProp tables, the tool follows the toolkit “structure-of-record” rule and will prefer Calc_Canonical_SMILESCalc_Base_SMILESCanonical_SMILESSMILES. Override with --smiles-col when you need a different representation.