Python API#

API reference#

class peppr.Evaluator(metrics: Iterable[Metric], tolerate_exceptions: bool = False, min_sequence_identity: float = 0.95)[source]#

This class represents the core of peppr. Systems are fed via feed() into the Evaluator. Finally, the evaluation is reported via tabulate_metrics(), which gives a scalar metric value for each fed system, or via summarize_metrics(), which aggregates the metrics over all systems.

Parameters:
metricsIterable of Metric

The metrics to evaluate the poses against. These will make up the columns of the resulting dataframe from tabulate_metrics().

tolerate_exceptionsbool, optional

If set to true, exceptions during Metric.evaluate() are not propagated. Instead a warning is raised and the result is set to None.

min_sequence_identityfloat

The minimum sequence identity for two chains to be considered the same entity.

Attributes:
metricstuple of Metric

The metrics to evaluate the poses against.

system_idstuple of str

The IDs of the systems that were fed into the evaluator.

feed(system_id: str, reference: AtomArray, poses: Sequence[AtomArray] | AtomArrayStack | AtomArray) None#

Evaluate the poses of a system against the reference structure for all metrics.

Parameters:
system_idstr

The ID of the system that was evaluated.

referenceAtomArray

The reference structure of the system. Each separate instance/molecule must have a distinct chain_id.

posesAtomArrayStack or list of AtomArray or AtomArray

The pose(s) to evaluate. It is expected that the poses are sorted from highest to lowest confidence, (relevant for Selector instances).

Notes

reference and poses must fulfill the following requirements:

  • The system must have an associated biotite.structure.BondList, i.e. the bonds attribute must not be None.

  • Each molecule in the system must have a distinct chain_id.

  • Chains where the hetero annotation is True is always interpreted as a small molecule. Conversely, chains where the hetero annotation is False is always interpreted as protein or nucleic acid chain.

The optimal chain mapping and atom mapping in symmetric small molecules is handled automatically.

get_results() list[list[ndarray]]#

Return the raw results of the evaluation.

This includes each metric evaluated on each pose of each system.

Returns:
list of list of np.ndarray

The raw results of the evaluation. The outer list iterates over the metrics, the inner list iterates over the systems and the array represents the values for each pose.

summarize_metrics(selectors: Iterable[Selector] | None = None) dict[str, float]#

Condense the system-wise evaluation to scalar values for each metric.

For each metric,

  • the mean value

  • the median value

  • and the percentage of systems within each threshold

is computed.

Parameters:
selectorslist of Selector, optional

The selectors to use for selecting the best pose of a multi-pose prediction. This parameter is not necessary if only single-pose predictions were fed into the Evaluator.

Returns:
dict (str -> float)

A dictionary mapping the summarized metric name to the scalar value. The summarized metric name contains

  • the metric name (e.g. DockQ)

  • the selector name, if a selector was used (e.g. Oracle)

  • the threshold (if a threshold was used) (e.g. % acceptable)

Examples

>>> import pprint
>>> pprint.pprint(evaluator.summarize_metrics())
{'CA-RMSD <5.0': 0.3,
 'CA-RMSD >5.0': 0.7,
 'CA-RMSD mean': 12.159182685504375,
 'TM-score mean': 0.6235582438144873,
 'lDDT mean': 0.5880769924413414}
>>> pprint.pprint(evaluator.summarize_metrics([MeanSelector(), OracleSelector()]))
{'CA-RMSD <5.0 (Oracle)': 0.3,
 'CA-RMSD <5.0 (mean)': 0.3,
 'CA-RMSD >5.0 (Oracle)': 0.7,
 'CA-RMSD >5.0 (mean)': 0.7,
 'CA-RMSD mean (Oracle)': 12.159182685504375,
 'CA-RMSD mean (mean)': 12.159182685504375,
 'TM-score mean (Oracle)': 0.6235582438144873,
 'TM-score mean (mean)': 0.6235582438144873,
 'lDDT mean (Oracle)': 0.5880769924413414,
 'lDDT mean (mean)': 0.5880769924413414}
tabulate_metrics(selectors: Iterable[Selector] | None = None) DataFrame#

Create a table listing the value for each metric and system.

Parameters:
selectorslist of Selector, optional

The selectors to use for selecting the best pose of a multi-pose prediction. This parameter is not necessary if only single-pose predictions were fed into the Evaluator.

Returns:
pandas.DataFrame

A table listing the value for each metric and system. The index is the system ID.

Examples

>>> print(evaluator.tabulate_metrics())
                                   RMSD      lDDT  TM-score
8ji2__1__1.B__1.J_1.K          2.987937  0.674205  0.883589
7t4w__1__1.A__1.C             16.762669  0.693087  0.380107
8jp0__1__1.A__1.B             26.281593  0.510061  0.316204
7yn2__1__1.A_1.B__1.C          6.657655  0.567117  0.725322
8oxu__2__1.C__1.E             14.977116  0.339707  0.296535
7ydq__1__1.A__1.B             26.111820  0.383841  0.360584
7wuy__1__1.B__1.HA_1.IA_1.OA  16.494774  0.665949  0.666633
7xh4__1__1.A__1.B_1.C          1.787062  0.748987  0.915388
7v34__1__1.A__1.C_1.D_1.G      4.472874  0.567491  0.822537
8jmr__1__1.A_1.B__1.C_1.D      5.058327  0.730324  0.868684
>>> print(evaluator.tabulate_metrics(OracleSelector()))
                              CA-RMSD (Oracle)  lDDT (Oracle)  TM-score (Oracle)
8ji2__1__1.B__1.J_1.K                 2.987937       0.674205           0.883589
7t4w__1__1.A__1.C                    16.762669       0.693087           0.380107
8jp0__1__1.A__1.B                    26.281593       0.510061           0.316204
7yn2__1__1.A_1.B__1.C                 6.657655       0.567117           0.725322
8oxu__2__1.C__1.E                    14.977116       0.339707           0.296535
7ydq__1__1.A__1.B                    26.111820       0.383841           0.360584
7wuy__1__1.B__1.HA_1.IA_1.OA         16.494774       0.665949           0.666633
7xh4__1__1.A__1.B_1.C                 1.787062       0.748987           0.915388
7v34__1__1.A__1.C_1.D_1.G             4.472874       0.567491           0.822537
8jmr__1__1.A_1.B__1.C_1.D             5.058327       0.730324           0.868684

Metrics#

The metrics for pose evaluation.

Metric()

The base class for all evaluation metrics.

MonomerRMSD(threshold[, ca_only])

Compute the root mean squared deviation (RMSD) between each peptide chain in the reference and the pose and take the mean weighted by the number of heavy atoms.

MonomerTMScore()

Compute the TM-score score for each monomer and take the mean weighted by the number of atoms.

MonomerLDDTScore()

Compute the local Distance Difference Test (lDDT) score for each monomer and take the mean weighted by the number of atoms.

IntraLigandLDDTScore()

Compute the local Distance Difference Test (lDDT) score for contacts within each small molecule.

LDDTPLIScore()

Compute the CASP LDDT-PLI score, i.e. the lDDT for protein-ligand interactions as defined by [Rfb947263ee55-1].

LDDTPPIScore()

Compute the the lDDT for protein-protein interactions, i.e. all intra-chain contacts are not included.

GlobalLDDTScore([backbone_only])

Compute the lDDT score for all contacts in the system, i.e. both intra- and inter-chain contacts.

DockQScore([include_pli])

Compute the DockQ score for the given complex as defined in [R19e7a98e93e0-1].

LigandRMSD()

Compute the Ligand RMSD for the given protein complex as defined in [R84484f61b266-1].

InterfaceRMSD()

Compute the Interface RMSD for the given protein complex as defined in [Rf63c12e6cf3e-1].

ContactFraction()

Compute the fraction of correctly predicted reference contacts (Fnat) as defined in [Rdd00b73bb0c4-1].

PocketAlignedLigandRMSD()

Compute the Pocket aligned ligand RMSD for the given PLI complex as defined in [R77e8bc3bd056-1].

BiSyRMSD(threshold[, inclusion_radius, ...])

Compute the Binding-Site Superposed, Symmetry-Corrected Pose RMSD (BiSyRMSD) for the given PLI complex.

BondLengthViolations([tolerance, ...])

Check for unusual bond lengths in the structure by comparing against reference values.

ClashCount()

Count the number of clashes between atoms in the pose.

Selectors#

Selection of the desired metric result from multiple poses.

Selector()

The base class for all pose selectors.

MeanSelector()

Selector that computes the mean of the values.

MedianSelector()

Selector that computes the median of the values.

OracleSelector()

Selector that returns the best value.

TopSelector(k)

Selector that returns the best value from the k values with highest confidence.

RandomSelector(k[, seed])

Selector that returns the best value from k randomly chosen values.

Analysis functions#

Underlying functions used be the Metric classes to compute the metric values, that are not directly implemented biotite.structure.

bisy_rmsd(reference, pose[, ...])

Compute the Binding-Site Superposed, Symmetry-Corrected Pose RMSD (BiSyRMSD) for the given PLI complex as defined in [R50064b0d173e-1].

find_clashes(atoms[, vdw_scaling])

Find atom clashes in the given structure.

dockq(reference_receptor, reference_ligand, ...)

Compute DockQ for a single pair of receptor and ligand in both, the pose and reference structure.

pocket_aligned_lrmsd(reference_receptor, ...)

Compute the pocket-aligned RMSD part of the DockQ score for small molecules.

lrmsd(reference_receptor, reference_ligand, ...)

Compute the ligand RMSD part of the DockQ score.

irmsd(reference_receptor, reference_ligand, ...)

Compute the interface RMSD part of the DockQ score.

fnat(reference_receptor, reference_ligand, ...)

Compute the fnat and fnonnat part of the DockQ score.

DockQ(fnat, fnonnat, irmsd, lrmsd[, ...])

Result of a DockQ calculation.

Miscellaneous#

MatchWarning

This warning is raised, if a the Evaluator fails to match atoms between the reference and pose structures.

GraphMatchWarning

This warning is raised, if the RDKit based molecule matching fails.

EvaluationWarning

This warning is raised, if a Metric fails to evaluate a pose.

NoContactError

sanitize(mol[, max_fix_iterations])

Fix small issues with RDKit SanitizeMol and sanitize molecule.

standardize(system)

Standardize the given system.

find_matching_atoms(reference, pose[, ...])

Find the optimal atom order for each pose that minimizes the RMSD to the reference.

is_small_molecule(chain)

Check whether the given chain is a small molecule.

get_contact_residues(receptor, ligand, cutoff)

Get a set of tuples containing the residue IDs for each contact between receptor and ligand.