Python API#

API reference#

class peppr.Evaluator(metrics: Iterable[Metric], match_method: MatchMethod = MatchMethod.HEURISTIC, max_matches: int | None = None, tolerate_exceptions: bool = False, min_sequence_identity: float = 0.95)[source]#

This class represents the core of peppr. Systems are fed via feed() into the Evaluator. Finally, the evaluation is reported via tabulate_metrics(), which gives a scalar metric value for each fed system, or via summarize_metrics(), which aggregates the metrics over all systems.

Parameters:

metricsIterable of Metric: The metrics to evaluate the poses against. These will make up the columns of the resulting dataframe from tabulate_metrics().
match_methodMatchMethod, optional: The strategy to use for finding atom matches between the reference and pose. This can be used to trade off speed and accuracy.
max_matchesint, optional: The maximum number of atom matches to try, if the match_method is set to EXHAUSTIVE or INDIVIDUAL.
tolerate_exceptionsbool, optional: If set to true, exceptions during Metric.evaluate() are not propagated. Instead a warning is raised and the result is set to None.
min_sequence_identityfloat: The minimum sequence identity for two chains to be considered the same entity.

Attributes:

metricstuple of Metric: The metrics to evaluate the poses against.
system_idstuple of str: The IDs of the systems that were fed into the evaluator.

class MatchMethod(value)#

Method for finding atom matches between the fed reference and pose. These methods represent a tradeoff between speed and accuracy.

HEURISTIC: Use a fast heuristic [1] that matches the reference and pose by minimizing the RMSD between the centroids of each chain. This method is fast and scales linearly with the number of chains, but it is not guaranteed to find the optimal match in all cases, especially when the pose and reference are quite distant from each other.
EXHAUSTIVE: Exhaustively iterate through all valid atom mappings between the reference and pose and select the one that gives the lowest all-atom RMSD. This method is slower and prone to combinatorial explosion, but it finds better matches in edge cases.
INDIVIDUAL: Like EXHAUSTIVE, but instead of using the RMSD as criterion for optimization, each individual Metric is used. As this requires exhaustive iteration over all mappings and computing the each metric for all of them, this method is slower than EXHAUSTIVE. However, it guarantees to find the optimal match for each metric.
NONE: Skip atom matching entirely and evaluate metrics on the structures as provided. This is useful when the reference and pose are already properly aligned, or when using metrics that don’t require matching (e.g., bond-length violations, clash counts).

References

[1]

Protein complex prediction with AlphaFold-Multimer, Section 7.3, https://doi.org/10.1101/2021.10.04.463034

static combine(evaluators: Iterable[Evaluator]) → Evaluator#

Combine multiple Evaluator instances into a single one, preserving the systems fed to each instance.

Parameters:

evaluatorsIterable of Evaluator: The evaluators to combine. The metrics, tolerate_exceptions and min_sequence_identity must be the same for all evaluators.

Returns:

Evaluator: The evaluator combining the systems of all input evaluators in the order of the input.

feed(system_id: str, reference: AtomArray, poses: Sequence[AtomArray] | AtomArrayStack | AtomArray) → None#

Evaluate the poses of a system against the reference structure for all metrics.

Parameters:

system_idstr: The ID of the system that was evaluated.
referenceAtomArray: The reference structure of the system. Each separate instance/molecule must have a distinct chain_id.
posesAtomArrayStack or list of AtomArray or AtomArray: The pose(s) to evaluate. It is expected that the poses are sorted from highest to lowest confidence, (relevant for Selector instances).

Notes

reference and poses must fulfill the following requirements:

The system must have an associated biotite.structure.BondList, i.e. the bonds attribute must not be None.
Each molecule in the system must have a distinct chain_id.
Chains where the hetero annotation is True is always interpreted as a small molecule. Conversely, chains where the hetero annotation is False is always interpreted as protein or nucleic acid chain.

The optimal atom matching is handled automatically based on the MatchMethod.

get_results() → list[list[ndarray]]#

Return the raw results of the evaluation.

This includes each metric evaluated on each pose of each system.

Returns:

list of list of np.ndarray: The raw results of the evaluation. The outer list iterates over the metrics, the inner list iterates over the systems and the array represents the values for each pose.

summarize_metrics(selectors: Iterable[Selector] | None = None) → dict[str, float]#

Condense the system-wise evaluation to scalar values for each metric.

For each metric,

the mean value
the median value
and the percentage of systems within each threshold

is computed.

Parameters:

selectorslist of Selector, optional: The selectors to use for selecting the best pose of a multi-pose prediction. This parameter is not necessary if only single-pose predictions were fed into the Evaluator.

Returns:

dict (str -> float)

A dictionary mapping the summarized metric name to the scalar value. The summarized metric name contains

the metric name (e.g. DockQ)
the selector name, if a selector was used (e.g. Oracle)
the threshold (if a threshold was used) (e.g. % acceptable)

tabulate_metrics(selectors: Iterable[Selector] | None = None) → DataFrame#

Create a table listing the value for each metric and system.

Parameters:

selectorslist of Selector, optional: The selectors to use for selecting the best pose of a multi-pose prediction. This parameter is not necessary if only single-pose predictions were fed into the Evaluator.

Returns:

pandas.DataFrame: A table listing the value for each metric and system. The index is the system ID.

Metrics#

The metrics for pose evaluation.

`Metric`()	The base class for all evaluation metrics.
`MonomerRMSD`(threshold[, ca_only])	Compute the root mean squared deviation (RMSD) between each peptide chain in the reference and the pose and take the mean weighted by the number of heavy atoms.
`MonomerTMScore`()	Compute the TM-score score for each monomer and take the mean weighted by the number of atoms.
`MonomerLDDTScore`()	Compute the local Distance Difference Test (lDDT) score for each monomer and take the mean weighted by the number of atoms.
`IntraLigandLDDTScore`()	Compute the local Distance Difference Test (lDDT) score for contacts within each small molecule.
`LDDTPLIScore`()	Compute the CASP LDDT-PLI score, i.e. the lDDT for protein-ligand interactions as defined by [Rfb947263ee55-1].
`LDDTPPIScore`()	Compute the the lDDT for protein-protein interactions, i.e. all intra-chain contacts are not included.
`GlobalLDDTScore`([backbone_only])	Compute the lDDT score for all contacts in the system, i.e. both intra- and inter-chain contacts.
`DockQScore`([include_pli])	Compute the DockQ score for the given complex as defined in [R19e7a98e93e0-1].
`LigandRMSD`()	Compute the Ligand RMSD for the given protein complex as defined in [R84484f61b266-1].
`InterfaceRMSD`()	Compute the Interface RMSD for the given protein complex as defined in [Rf63c12e6cf3e-1].
`ContactFraction`()	Compute the fraction of correctly predicted reference contacts (Fnat) as defined in [Rdd00b73bb0c4-1].
`PocketAlignedLigandRMSD`()	Compute the Pocket aligned ligand RMSD for the given PLI complex as defined in [R77e8bc3bd056-1].
`BiSyRMSD`(threshold[, inclusion_radius, ...])	Compute the Binding-Site Superposed, Symmetry-Corrected Pose RMSD (BiSyRMSD) for the given PLI complex.
`BondLengthViolations`([tolerance, ...])	Check for unusual bond lengths in the structure by comparing against reference values.
`BondAngleViolations`([tolerance])	Check for unusual bond angles in the structure by comparing against idealized bond geometry.
`ClashCount`()	Count the number of clashes between atoms in the pose.

Selectors#

Selection of the desired metric result from multiple poses.

`Selector`()	The base class for all pose selectors.
`MeanSelector`()	Selector that computes the mean of the values.
`MedianSelector`()	Selector that computes the median of the values.
`OracleSelector`()	Selector that returns the best value.
`TopSelector`(k)	Selector that returns the best value from the k values with highest confidence.
`RandomSelector`(k[, seed])	Selector that returns the best value from k randomly chosen values.

Analysis functions#

Underlying functions used be the Metric classes to compute the metric values, that are not directly implemented biotite.structure.

`bisy_rmsd`(reference, pose[, ...])	Compute the Binding-Site Superposed, Symmetry-Corrected Pose RMSD (BiSyRMSD) for the given PLI complex as defined in [R50064b0d173e-1].
`find_clashes`(atoms[, vdw_scaling])	Find atom clashes in the given structure.
`dockq`(reference_receptor, reference_ligand, ...)	Compute DockQ for a single pair of receptor and ligand in both, the pose and reference structure.
`pocket_aligned_lrmsd`(reference_receptor, ...)	Compute the pocket-aligned RMSD part of the DockQ score for small molecules.
`lrmsd`(reference_receptor, reference_ligand, ...)	Compute the ligand RMSD part of the DockQ score.
`irmsd`(reference_receptor, reference_ligand, ...)	Compute the interface RMSD part of the DockQ score.
`fnat`(reference_receptor, reference_ligand, ...)	Compute the fnat and fnonnat part of the DockQ score.
`DockQ`(fnat, fnonnat, irmsd, lrmsd[, ...])	Result of a DockQ calculation.

Atom Matching#

`find_optimal_match`(reference, pose[, ...])	Find the atom indices for the given reference and pose structure that brings these structure into a corresponding order that minimizes the RMSD between them.
`find_all_matches`(reference, pose[, ...])	Find all possible atom mappings between the reference and the pose.
`GraphMatchWarning`	This warning is raised, if the RDKit based molecule matching fails.
`UnmappableEntityError`	This exceptions is raised, if the reference and pose structure contain entities that cannot be mapped to each other.

Miscellaneous#

`sanitize`(mol[, max_fix_iterations])	Fix small issues with RDKit SanitizeMol and sanitize molecule.
`standardize`(system)	Standardize the given system.
`is_small_molecule`(chain)	Check whether the given chain is a small molecule.
`get_contact_residues`(receptor, ligand, cutoff)	Get a set of tuples containing the residue IDs for each contact between receptor and ligand.
`MatchWarning`	This warning is raised, if a the `Evaluator` fails to match atoms between the reference and pose structures.
`EvaluationWarning`	This warning is raised, if a `Metric` fails to evaluate a pose.
`NoContactError`

Python API#

API reference#

Metrics#

Selectors#

Analysis functions#

Atom Matching#

Miscellaneous#

This Page