Python API#
API reference#
- class peppr.Evaluator(metrics: Iterable[Metric], match_method: MatchMethod = MatchMethod.HEURISTIC, max_matches: int | None = None, tolerate_exceptions: bool = False, min_sequence_identity: float = 0.95)[source]#
This class represents the core of
peppr
. Systems are fed viafeed()
into theEvaluator
. Finally, the evaluation is reported viatabulate_metrics()
, which gives a scalar metric value for each fed system, or viasummarize_metrics()
, which aggregates the metrics over all systems.- Parameters:
- metricsIterable of Metric
The metrics to evaluate the poses against. These will make up the columns of the resulting dataframe from
tabulate_metrics()
.- match_methodMatchMethod, optional
The strategy to use for finding atom matches between the reference and pose. This can be used to trade off speed and accuracy.
- max_matchesint, optional
The maximum number of atom matches to try, if the match_method is set to
EXHAUSTIVE
orINDIVIDUAL
.- tolerate_exceptionsbool, optional
If set to true, exceptions during
Metric.evaluate()
are not propagated. Instead a warning is raised and the result is set toNone
.- min_sequence_identityfloat
The minimum sequence identity for two chains to be considered the same entity.
- Attributes:
- metricstuple of Metric
The metrics to evaluate the poses against.
- system_idstuple of str
The IDs of the systems that were fed into the evaluator.
- class MatchMethod(value)#
Method for finding atom matches between the fed reference and pose. These methods represent a tradeoff between speed and accuracy.
HEURISTIC
: Use a fast heuristic [1] that matches the reference and pose by minimizing the RMSD between the centroids of each chain. This method is fast and scales linearly with the number of chains, but it is not guaranteed to find the optimal match in all cases, especially when the pose and reference are quite distant from each other.EXHAUSTIVE
: Exhaustively iterate through all valid atom mappings between the reference and pose and select the one that gives the lowest all-atom RMSD. This method is slower and prone to combinatorial explosion, but it finds better matches in edge cases.INDIVIDUAL
: LikeEXHAUSTIVE
, but instead of using the RMSD as criterion for optimization, each individualMetric
is used. As this requires exhaustive iteration over all mappings and computing the each metric for all of them, this method is slower thanEXHAUSTIVE
. However, it guarantees to find the optimal match for each metric.NONE
: Skip atom matching entirely and evaluate metrics on the structures as provided. This is useful when the reference and pose are already properly aligned, or when using metrics that don’t require matching (e.g., bond-length violations, clash counts).
References
[1]Protein complex prediction with AlphaFold-Multimer, Section 7.3, https://doi.org/10.1101/2021.10.04.463034
- static combine(evaluators: Iterable[Evaluator]) Evaluator #
Combine multiple
Evaluator
instances into a single one, preserving the systems fed to each instance.- Parameters:
- evaluatorsIterable of Evaluator
The evaluators to combine. The
metrics
,tolerate_exceptions
andmin_sequence_identity
must be the same for all evaluators.
- Returns:
- Evaluator
The evaluator combining the systems of all input evaluators in the order of the input.
- feed(system_id: str, reference: AtomArray, poses: Sequence[AtomArray] | AtomArrayStack | AtomArray) None #
Evaluate the poses of a system against the reference structure for all metrics.
- Parameters:
- system_idstr
The ID of the system that was evaluated.
- referenceAtomArray
The reference structure of the system. Each separate instance/molecule must have a distinct chain_id.
- posesAtomArrayStack or list of AtomArray or AtomArray
The pose(s) to evaluate. It is expected that the poses are sorted from highest to lowest confidence, (relevant for
Selector
instances).
Notes
reference and poses must fulfill the following requirements:
The system must have an associated biotite.structure.BondList, i.e. the
bonds
attribute must not beNone
.Each molecule in the system must have a distinct
chain_id
.Chains where the
hetero
annotation isTrue
is always interpreted as a small molecule. Conversely, chains where thehetero
annotation isFalse
is always interpreted as protein or nucleic acid chain.
The optimal atom matching is handled automatically based on the
MatchMethod
.
- get_results() list[list[ndarray]] #
Return the raw results of the evaluation.
This includes each metric evaluated on each pose of each system.
- Returns:
- list of list of np.ndarray
The raw results of the evaluation. The outer list iterates over the metrics, the inner list iterates over the systems and the array represents the values for each pose.
- summarize_metrics(selectors: Iterable[Selector] | None = None) dict[str, float] #
Condense the system-wise evaluation to scalar values for each metric.
For each metric,
the mean value
the median value
and the percentage of systems within each threshold
is computed.
- Parameters:
- selectorslist of Selector, optional
The selectors to use for selecting the best pose of a multi-pose prediction. This parameter is not necessary if only single-pose predictions were fed into the
Evaluator
.
- Returns:
- dict (str -> float)
A dictionary mapping the summarized metric name to the scalar value. The summarized metric name contains
the metric name (e.g.
DockQ
)the selector name, if a selector was used (e.g.
Oracle
)the threshold (if a threshold was used) (e.g.
% acceptable
)
- tabulate_metrics(selectors: Iterable[Selector] | None = None) DataFrame #
Create a table listing the value for each metric and system.
- Parameters:
- selectorslist of Selector, optional
The selectors to use for selecting the best pose of a multi-pose prediction. This parameter is not necessary if only single-pose predictions were fed into the
Evaluator
.
- Returns:
- pandas.DataFrame
A table listing the value for each metric and system. The index is the system ID.
Metrics#
The metrics for pose evaluation.
|
The base class for all evaluation metrics. |
|
Compute the root mean squared deviation (RMSD) between each peptide chain in the reference and the pose and take the mean weighted by the number of heavy atoms. |
Compute the TM-score score for each monomer and take the mean weighted by the number of atoms. |
|
Compute the local Distance Difference Test (lDDT) score for each monomer and take the mean weighted by the number of atoms. |
|
Compute the local Distance Difference Test (lDDT) score for contacts within each small molecule. |
|
Compute the CASP LDDT-PLI score, i.e. the lDDT for protein-ligand interactions as defined by [Rfb947263ee55-1]. |
|
Compute the the lDDT for protein-protein interactions, i.e. all intra-chain contacts are not included. |
|
|
Compute the lDDT score for all contacts in the system, i.e. both intra- and inter-chain contacts. |
|
Compute the DockQ score for the given complex as defined in [R19e7a98e93e0-1]. |
Compute the Ligand RMSD for the given protein complex as defined in [R84484f61b266-1]. |
|
Compute the Interface RMSD for the given protein complex as defined in [Rf63c12e6cf3e-1]. |
|
Compute the fraction of correctly predicted reference contacts (Fnat) as defined in [Rdd00b73bb0c4-1]. |
|
Compute the Pocket aligned ligand RMSD for the given PLI complex as defined in [R77e8bc3bd056-1]. |
|
|
Compute the Binding-Site Superposed, Symmetry-Corrected Pose RMSD (BiSyRMSD) for the given PLI complex. |
|
Check for unusual bond lengths in the structure by comparing against reference values. |
|
Check for unusual bond angles in the structure by comparing against idealized bond geometry. |
Count the number of clashes between atoms in the pose. |
Selectors#
Selection of the desired metric result from multiple poses.
|
The base class for all pose selectors. |
Selector that computes the mean of the values. |
|
Selector that computes the median of the values. |
|
Selector that returns the best value. |
|
|
Selector that returns the best value from the k values with highest confidence. |
|
Selector that returns the best value from k randomly chosen values. |
Analysis functions#
Underlying functions used be the Metric
classes to compute the metric values,
that are not directly implemented biotite.structure
.
|
Compute the Binding-Site Superposed, Symmetry-Corrected Pose RMSD (BiSyRMSD) for the given PLI complex as defined in [R50064b0d173e-1]. |
|
Find atom clashes in the given structure. |
|
Compute DockQ for a single pair of receptor and ligand in both, the pose and reference structure. |
|
Compute the pocket-aligned RMSD part of the DockQ score for small molecules. |
|
Compute the ligand RMSD part of the DockQ score. |
|
Compute the interface RMSD part of the DockQ score. |
|
Compute the fnat and fnonnat part of the DockQ score. |
|
Result of a DockQ calculation. |
Atom Matching#
|
Find the atom indices for the given reference and pose structure that brings these structure into a corresponding order that minimizes the RMSD between them. |
|
Find all possible atom mappings between the reference and the pose. |
This warning is raised, if the RDKit based molecule matching fails. |
|
This exceptions is raised, if the reference and pose structure contain entities that cannot be mapped to each other. |
Miscellaneous#
|
Fix small issues with RDKit SanitizeMol and sanitize molecule. |
|
Standardize the given system. |
|
Check whether the given chain is a small molecule. |
|
Get a set of tuples containing the residue IDs for each contact between receptor and ligand. |
This warning is raised, if a the |
|
This warning is raised, if a |
|