Molecule PolyGraphDiscrepancy
MoleculePGD is a PolyGraphDiscrepancy metric based on different molecule descriptors.
TopoChemicalDescriptor: Topological features based on bond structureFingerprintDescriptor: Molecular fingerprintsLipinskiDescriptor: Physico-chemical propertiesChemNetDescriptor: Random projection of ChemNet embeddings, based on SMILES stringsMolCLRDescriptor: Random projection of MolCLR embeddings from a GNN
By default, we use TabPFN for binary classification and evaluate it by data log-likelihood, obtaining a PolyGraphDiscrepancy that provides an estimated lower bound on the Jensen-Shannon distance between the generated and true graph distribution.
import rdkit.Chem
from polygraph.metrics.molecule_pgd import MoleculePGD
smiles_a = [
"CC(=O)Oc1ccccc1C(=O)O",
"CC(=O)Nc1ccc(O)cc1",
"CC(C)Cc1ccc(cc1)C(C)C(=O)O",
"CC1(C)SC2C(NC(=O)C2=O)C1(C)C(=O)N",
"C1C(=O)N(C2=CC=CC=C12)C3=CC=C(C=C3)C(F)(F)F",
"CCCCCCOc1ccc(C(=O)C=Cc2c(C=Cc3ccc(OC)cc3)cc(OC)cc2OC)cc1",
"O=C(Nc1nc(-c2ccc(Cl)s2)cs1)c1ccncc1",
"COc1nc(N(C)C)ncc1-n1nc2c(c1C(C)C)C(c1ccc(C#N)c(F)c1)N(c1c[nH]c(=O)c(Cl)c1)C2=O",
]
smiles_b = [
"CC1=C(C=CC=C1)NC2=NC=CC(=N2)NC3=CC=CC=C3C(=O)NC4=CC=CC=N4",
"CN1CCN(C2=CC3=C(C=C2)N=CN3C)C4=CC=CC=C14",
"CN(C)CCCN1C2=CC=CC=C2SC3=CC=CC=C31",
"CC(C)C(C(=O)NCC(C)C)NC(=O)C1=CC=CC=C1C(C)C(C)NC(=O)C2=CN=CC=C2",
"CN1C(=O)CN=C(C2=CC=CC=C12)C3=CC=CC=C3Cl",
"O=C(c1cc(-c2ccc(Cl)cc2Cl)n[nH]1)N1CCCC1",
"COc1cccc(OC)c1C=CC(=O)NC1CCCCC1",
"O=C1NC(O)CCN1C1OC(CO)C(O)C1O",
]
mols_a = [rdkit.Chem.MolFromSmiles(smiles) for smiles in smiles_a]
mols_b = [rdkit.Chem.MolFromSmiles(smiles) for smiles in smiles_b]
metric = MoleculePGD(mols_a)
print(metric.compute(mols_b))
MoleculePGD
polygraph.metrics.molecule_pgd.MoleculePGD
Bases: PolyGraphDiscrepancy[Mol]
MoleculePGD to compare molecule distributions, combining different molecule descriptors.
| Parameters: |
|
|---|
polygraph.metrics.molecule_pgd.MoleculePGDInterval
Bases: PolyGraphDiscrepancyInterval[Mol]
Uncertainty quantification for MoleculePGD.
| Parameters: |
|
|---|