Python API#

API reference#

class peppr.Evaluator(metrics: Iterable[Metric], match_method: MatchMethod = MatchMethod.HEURISTIC, max_matches: int | None = None, tolerate_exceptions: bool = False, min_sequence_identity: float = 0.95, allow_unmatched_entities: bool = False)[source]#

This class represents the core of peppr. Systems are fed via feed() into the Evaluator. Finally, the evaluation is reported via tabulate_metrics(), which gives a scalar metric value for each fed system, or via summarize_metrics(), which aggregates the metrics over all systems.

Parameters:
metricsIterable of Metric

The metrics to evaluate the poses against. These will make up the columns of the resulting dataframe from tabulate_metrics().

match_methodMatchMethod, optional

The strategy to use for finding atom matches between the reference and pose. This can be used to trade off speed and accuracy.

max_matchesint, optional

The maximum number of atom matches to try, if the match_method is set to EXHAUSTIVE or INDIVIDUAL.

tolerate_exceptionsbool, optional

If set to true, exceptions during Metric.evaluate() are not propagated. Instead a warning is raised and the result is set to None.

min_sequence_identityfloat

The minimum sequence identity for two chains to be considered the same entity.

allow_unmatched_entitiesbool, optional

If set to True, allow entire entities to be unmatched. This is useful if a pose is compared to a reference which may contain different molecules.

Attributes:
metricstuple of Metric

The metrics to evaluate the poses against.

system_idstuple of str

The IDs of the systems that were fed into the evaluator.

class MatchMethod(value)#

Method for finding atom matches between the fed reference and pose. These methods represent a tradeoff between speed and accuracy.

  • HEURISTIC: Use a fast heuristic [1] that matches the reference and pose by minimizing the RMSD between the centroids of each chain. This method is fast and scales linearly with the number of chains, but it is not guaranteed to find the optimal match in all cases, especially when the pose and reference are quite distant from each other.

  • EXHAUSTIVE: Exhaustively iterate through all valid atom mappings between the reference and pose and select the one that gives the lowest all-atom RMSD. This method is slower and prone to combinatorial explosion, but it finds better matches in edge cases.

  • INDIVIDUAL: Like EXHAUSTIVE, but instead of using the RMSD as criterion for optimization, each individual Metric is used. As this requires exhaustive iteration over all mappings and computing the each metric for all of them, this method is slower than EXHAUSTIVE. However, it guarantees to find the optimal match for each metric.

  • NONE: Skip atom matching entirely and evaluate metrics on the structures as provided. This is useful when the reference and pose are already properly aligned, or when using metrics that don’t require matching (e.g., bond-length violations, clash counts).

References

[1]

Protein complex prediction with AlphaFold-Multimer, Section 7.3, https://doi.org/10.1101/2021.10.04.463034

static combine(evaluators: Iterable[Evaluator]) Evaluator#

Combine multiple Evaluator instances into a single one, preserving the systems fed to each instance.

Parameters:
evaluatorsIterable of Evaluator

The evaluators to combine. The metrics, tolerate_exceptions and min_sequence_identity must be the same for all evaluators.

Returns:
Evaluator

The evaluator combining the systems of all input evaluators in the order of the input.

feed(system_id: str, reference: AtomArray, poses: Sequence[AtomArray] | AtomArrayStack | AtomArray) None#

Evaluate the poses of a system against the reference structure for all metrics.

Parameters:
system_idstr

The ID of the system that was evaluated.

referenceAtomArray

The reference structure of the system. Each separate instance/molecule must have a distinct chain_id.

posesAtomArrayStack or list of AtomArray or AtomArray

The pose(s) to evaluate. It is expected that the poses are sorted from highest to lowest confidence, (relevant for Selector instances).

Notes

reference and poses must fulfill the following requirements:

  • The system must have an associated biotite.structure.BondList, i.e. the bonds attribute must not be None.

  • Each molecule in the system must have a distinct chain_id.

  • Chains where the hetero annotation is True is always interpreted as a small molecule. Conversely, chains where the hetero annotation is False is always interpreted as protein or nucleic acid chain.

The optimal atom matching is handled automatically based on the MatchMethod.

get_results() list[list[ndarray]]#

Return the raw results of the evaluation.

This includes each metric evaluated on each pose of each system.

Returns:
list of list of np.ndarray

The raw results of the evaluation. The outer list iterates over the metrics, the inner list iterates over the systems and the array represents the values for each pose.

summarize_metrics(selectors: Iterable[Selector] | None = None) dict[str, float]#

Condense the system-wise evaluation to scalar values for each metric.

For each metric,

  • the mean value

  • the median value

  • and the percentage of systems within each threshold

is computed.

Parameters:
selectorslist of Selector, optional

The selectors to use for selecting the best pose of a multi-pose prediction. This parameter is not necessary if only single-pose predictions were fed into the Evaluator.

Returns:
dict (str -> float)

A dictionary mapping the summarized metric name to the scalar value. The summarized metric name contains

  • the metric name (e.g. DockQ)

  • the selector name, if a selector was used (e.g. Oracle)

  • the threshold (if a threshold was used) (e.g. % acceptable)

tabulate_metrics(selectors: Iterable[Selector] | None = None) DataFrame#

Create a table listing the value for each metric and system.

Parameters:
selectorslist of Selector, optional

The selectors to use for selecting the best pose of a multi-pose prediction. This parameter is not necessary if only single-pose predictions were fed into the Evaluator.

Returns:
pandas.DataFrame

A table listing the value for each metric and system. The index is the system ID.

Metrics#

The metrics for pose evaluation.

Metric()

The base class for all evaluation metrics.

MonomerRMSD(threshold[, ca_only])

Compute the root mean squared deviation (RMSD) between each peptide chain in the reference and the pose and take the mean weighted by the number of heavy atoms.

MonomerTMScore()

Compute the TM-score score for each monomer and take the mean weighted by the number of atoms.

MonomerLDDTScore()

Compute the local Distance Difference Test (lDDT) score for each monomer and take the mean weighted by the number of atoms.

IntraLigandLDDTScore()

Compute the local Distance Difference Test (lDDT) score for contacts within each small molecule.

LDDTPLIScore()

Compute the CASP LDDT-PLI score, i.e. the lDDT for protein-ligand interactions as defined by [Rfb947263ee55-1].

LDDTPPIScore()

Compute the the lDDT for protein-protein interactions, i.e. all intra-chain contacts are not included.

GlobalLDDTScore([backbone_only])

Compute the lDDT score for all contacts in the system, i.e. both intra- and inter-chain contacts.

DockQScore([include_pli])

Compute the DockQ score for the given complex as defined in [R19e7a98e93e0-1].

LigandRMSD()

Compute the Ligand RMSD for the given protein complex as defined in [R84484f61b266-1].

InterfaceRMSD()

Compute the Interface RMSD for the given protein complex as defined in [Rf63c12e6cf3e-1].

ContactFraction()

Compute the fraction of correctly predicted reference contacts (Fnat) as defined in [Rdd00b73bb0c4-1].

PocketAlignedLigandRMSD()

Compute the Pocket aligned ligand RMSD for the given PLI complex as defined in [R77e8bc3bd056-1].

BiSyRMSD(threshold[, inclusion_radius, ...])

Compute the Binding-Site Superposed, Symmetry-Corrected Pose RMSD (BiSyRMSD) for the given PLI complex.

BondLengthViolations([tolerance, ...])

Check for unusual bond lengths in the structure by comparing against reference values.

BondAngleViolations([tolerance])

Check for unusual bond angles in the structure by comparing against idealized bond geometry.

ChiralityViolations()

Check for differences in the chirality of the reference and pose.

ClashCount()

Count the number of clashes between atoms in the pose.

PLIFRecovery([ph, binding_site_cutoff, ...])

Calculates the Protein-Ligand Interaction Fingerprint (PLIF) recovery rate.

PocketDistance([use_pose_centroids])

Calculates the distance between the centroid of the reference ligand (i.e. the pocket center) and the pose in the ligand.

PocketVolumeOverlap([voxel_size])

Calculates the discretized volume overlap (DVO) between the reference and pose ligand.

RotamerViolations()

Check for the fraction of improbable amino acid rotamer angles, based on known crystal structures in the Top8000 dataset [R6ab4a0e68d50-1].

RamachandranViolations()

Check for the fraction of improbable \(\phi\)/\(\psi\) angles, based on known crystal structures in the Top8000 dataset [Ra373abf4b613-1].

Selectors#

Selection of the desired metric result from multiple poses.

Selector()

The base class for all pose selectors.

MeanSelector()

Selector that computes the mean of the values.

MedianSelector()

Selector that computes the median of the values.

OracleSelector()

Selector that returns the best value.

TopSelector(k)

Selector that returns the best value from the k values with highest confidence.

RandomSelector(k[, seed])

Selector that returns the best value from k randomly chosen values.

Analysis functions#

Underlying functions used be the Metric classes to compute the metric values, that are not directly implemented biotite.structure.

bisy_rmsd(reference, pose[, ...])

Compute the Binding-Site Superposed, Symmetry-Corrected Pose RMSD (BiSyRMSD) for the given PLI complex as defined in [R50064b0d173e-1].

find_clashes(atoms[, vdw_scaling])

Find atom clashes in the given structure.

dockq(reference_receptor, reference_ligand, ...)

Compute DockQ for a single pair of receptor and ligand in both, the pose and reference structure.

pocket_aligned_lrmsd(reference_receptor, ...)

Compute the pocket-aligned RMSD part of the DockQ score for small molecules.

lrmsd(reference_receptor, reference_ligand, ...)

Compute the ligand RMSD part of the DockQ score.

irmsd(reference_receptor, reference_ligand, ...)

Compute the interface RMSD part of the DockQ score.

fnat(reference_receptor, reference_ligand, ...)

Compute the fnat and fnonnat part of the DockQ score.

DockQ(fnat, fnonnat, irmsd, lrmsd[, ...])

Result of a DockQ calculation.

ContactMeasurement(receptor, ligand, cutoff)

This class allows measurements of receptor-ligand contacts of specific types (e.g. hydrogen bonds) by using SMARTS patterns.

volume(molecule[, voxel_size])

Calculate the volume of the given molecule.

volume_overlap(molecules[, voxel_size])

Calculate the volume of the given molecules and how their volumes overlap (i.e. their intersection and union).

RotamerScore(rotamer_scores)

Rotamer score for a given protein structure.

RamaScore(rama_scores)

Ramachandran score for a given protein structure.

get_fraction_of_rotamer_outliers(atom_array)

Compute the fraction of rotamer outliers for given structure.

get_fraction_of_rama_outliers(atom_array)

Compute the fraction of Ramachandran outliers for given structure.

Atom Matching#

find_optimal_match(reference, pose[, ...])

Match the atoms from the given reference and pose structure so that the RMSD between them is minimized.

find_all_matches(reference, pose[, ...])

Find all possible atom mappings between the reference and the pose.

find_matching_centroids(reference_centroids, ...)

Greedily find pairs of chains (each represented by its centroid) between the reference and the pose that are closest to each other.

filter_matched(reference, pose[, prefilter])

Filter the matched atoms from the reference and pose, i.e. where their matched annotation is True.

GraphMatchWarning

This warning is raised if the RDKit based molecule matching fails.

UnmappableEntityError

This exception is raised if the reference and pose structure contain entities that cannot be mapped to each other.

StructureMismatchError

This exception is raised if the reference and pose structure filtered to matched atoms do not actually match.

Miscellaneous#

sanitize(mol[, max_fix_iterations])

Fix small issues with RDKit SanitizeMol and sanitize molecule.

standardize(system)

Standardize the given system.

is_small_molecule(chain)

Check whether the given chain is a small molecule.

get_contact_residues(receptor, ligand, cutoff)

Get a set of tuples containing the residue IDs for each contact between receptor and ligand.

find_atoms_by_pattern(mol, pattern)

Find atoms that fulfill the given SMARTS pattern.

estimate_formal_charges(atoms[, ph])

Determine the formal charge of each atom in the structure.

MatchWarning

This warning is raised, if a the Evaluator fails to match atoms between the reference and pose structures.

EvaluationWarning

This warning is raised, if a Metric fails to evaluate a pose.

NoContactError