Python API#
API reference#
- class peppr.Evaluator(metrics: Iterable[Metric], tolerate_exceptions: bool = False, min_sequence_identity: float = 0.95)[source]#
This class represents the core of
peppr
. Systems are fed viafeed()
into theEvaluator
. Finally, the evaluation is reported viatabulate_metrics()
, which gives a scalar metric value for each fed system, or viasummarize_metrics()
, which aggregates the metrics over all systems.- Parameters:
- metricsIterable of Metric
The metrics to evaluate the poses against. These will make up the columns of the resulting dataframe from
tabulate_metrics()
.- tolerate_exceptionsbool, optional
If set to true, exceptions during
Metric.evaluate()
are not propagated. Instead a warning is raised and the result is set toNone
.- min_sequence_identityfloat
The minimum sequence identity for two chains to be considered the same entity.
- Attributes:
- metricstuple of Metric
The metrics to evaluate the poses against.
- system_idstuple of str
The IDs of the systems that were fed into the evaluator.
- feed(system_id: str, reference: AtomArray, poses: Sequence[AtomArray] | AtomArrayStack | AtomArray) None #
Evaluate the poses of a system against the reference structure for all metrics.
- Parameters:
- system_idstr
The ID of the system that was evaluated.
- referenceAtomArray
The reference structure of the system. Each separate instance/molecule must have a distinct chain_id.
- posesAtomArrayStack or list of AtomArray or AtomArray
The pose(s) to evaluate. It is expected that the poses are sorted from highest to lowest confidence, (relevant for
Selector
instances).
Notes
reference and poses must fulfill the following requirements:
The system must have an associated biotite.structure.BondList, i.e. the
bonds
attribute must not beNone
.Each molecule in the system must have a distinct
chain_id
.Chains where the
hetero
annotation isTrue
is always interpreted as a small molecule. Conversely, chains where thehetero
annotation isFalse
is always interpreted as protein or nucleic acid chain.
The optimal chain mapping and atom mapping in symmetric small molecules is handled automatically.
- get_results() list[list[ndarray]] #
Return the raw results of the evaluation.
This includes each metric evaluated on each pose of each system.
- Returns:
- list of list of np.ndarray
The raw results of the evaluation. The outer list iterates over the metrics, the inner list iterates over the systems and the array represents the values for each pose.
- summarize_metrics(selectors: Iterable[Selector] | None = None) dict[str, float] #
Condense the system-wise evaluation to scalar values for each metric.
For each metric,
the mean value
the median value
and the percentage of systems within each threshold
is computed.
- Parameters:
- selectorslist of Selector, optional
The selectors to use for selecting the best pose of a multi-pose prediction. This parameter is not necessary if only single-pose predictions were fed into the
Evaluator
.
- Returns:
- dict (str -> float)
A dictionary mapping the summarized metric name to the scalar value. The summarized metric name contains
the metric name (e.g.
DockQ
)the selector name, if a selector was used (e.g.
Oracle
)the threshold (if a threshold was used) (e.g.
% acceptable
)
Examples
>>> import pprint >>> pprint.pprint(evaluator.summarize_metrics()) {'CA-RMSD <5.0': 0.3, 'CA-RMSD >5.0': 0.7, 'CA-RMSD mean': 12.159182685504375, 'TM-score mean': 0.6235582438144873, 'lDDT mean': 0.5880769924413414} >>> pprint.pprint(evaluator.summarize_metrics([MeanSelector(), OracleSelector()])) {'CA-RMSD <5.0 (Oracle)': 0.3, 'CA-RMSD <5.0 (mean)': 0.3, 'CA-RMSD >5.0 (Oracle)': 0.7, 'CA-RMSD >5.0 (mean)': 0.7, 'CA-RMSD mean (Oracle)': 12.159182685504375, 'CA-RMSD mean (mean)': 12.159182685504375, 'TM-score mean (Oracle)': 0.6235582438144873, 'TM-score mean (mean)': 0.6235582438144873, 'lDDT mean (Oracle)': 0.5880769924413414, 'lDDT mean (mean)': 0.5880769924413414}
- tabulate_metrics(selectors: Iterable[Selector] | None = None) DataFrame #
Create a table listing the value for each metric and system.
- Parameters:
- selectorslist of Selector, optional
The selectors to use for selecting the best pose of a multi-pose prediction. This parameter is not necessary if only single-pose predictions were fed into the
Evaluator
.
- Returns:
- pandas.DataFrame
A table listing the value for each metric and system. The index is the system ID.
Examples
>>> print(evaluator.tabulate_metrics()) RMSD lDDT TM-score 8ji2__1__1.B__1.J_1.K 2.987937 0.674205 0.883589 7t4w__1__1.A__1.C 16.762669 0.693087 0.380107 8jp0__1__1.A__1.B 26.281593 0.510061 0.316204 7yn2__1__1.A_1.B__1.C 6.657655 0.567117 0.725322 8oxu__2__1.C__1.E 14.977116 0.339707 0.296535 7ydq__1__1.A__1.B 26.111820 0.383841 0.360584 7wuy__1__1.B__1.HA_1.IA_1.OA 16.494774 0.665949 0.666633 7xh4__1__1.A__1.B_1.C 1.787062 0.748987 0.915388 7v34__1__1.A__1.C_1.D_1.G 4.472874 0.567491 0.822537 8jmr__1__1.A_1.B__1.C_1.D 5.058327 0.730324 0.868684 >>> print(evaluator.tabulate_metrics(OracleSelector())) CA-RMSD (Oracle) lDDT (Oracle) TM-score (Oracle) 8ji2__1__1.B__1.J_1.K 2.987937 0.674205 0.883589 7t4w__1__1.A__1.C 16.762669 0.693087 0.380107 8jp0__1__1.A__1.B 26.281593 0.510061 0.316204 7yn2__1__1.A_1.B__1.C 6.657655 0.567117 0.725322 8oxu__2__1.C__1.E 14.977116 0.339707 0.296535 7ydq__1__1.A__1.B 26.111820 0.383841 0.360584 7wuy__1__1.B__1.HA_1.IA_1.OA 16.494774 0.665949 0.666633 7xh4__1__1.A__1.B_1.C 1.787062 0.748987 0.915388 7v34__1__1.A__1.C_1.D_1.G 4.472874 0.567491 0.822537 8jmr__1__1.A_1.B__1.C_1.D 5.058327 0.730324 0.868684
Metrics#
The metrics for pose evaluation.
|
The base class for all evaluation metrics. |
|
Compute the root mean squared deviation (RMSD) between each peptide chain in the reference and the pose and take the mean weighted by the number of heavy atoms. |
Compute the TM-score score for each monomer and take the mean weighted by the number of atoms. |
|
Compute the local Distance Difference Test (lDDT) score for each monomer and take the mean weighted by the number of atoms. |
|
Compute the local Distance Difference Test (lDDT) score for contacts within each small molecule. |
|
Compute the CASP LDDT-PLI score, i.e. the lDDT for protein-ligand interactions as defined by [Rfb947263ee55-1]. |
|
Compute the the lDDT for protein-protein interactions, i.e. all intra-chain contacts are not included. |
|
|
Compute the lDDT score for all contacts in the system, i.e. both intra- and inter-chain contacts. |
|
Compute the DockQ score for the given complex as defined in [R19e7a98e93e0-1]. |
Compute the Ligand RMSD for the given protein complex as defined in [R84484f61b266-1]. |
|
Compute the Interface RMSD for the given protein complex as defined in [Rf63c12e6cf3e-1]. |
|
Compute the fraction of correctly predicted reference contacts (Fnat) as defined in [Rdd00b73bb0c4-1]. |
|
Compute the Pocket aligned ligand RMSD for the given PLI complex as defined in [R77e8bc3bd056-1]. |
|
|
Compute the Binding-Site Superposed, Symmetry-Corrected Pose RMSD (BiSyRMSD) for the given PLI complex. |
|
Check for unusual bond lengths in the structure by comparing against reference values. |
Count the number of clashes between atoms in the pose. |
Selectors#
Selection of the desired metric result from multiple poses.
|
The base class for all pose selectors. |
Selector that computes the mean of the values. |
|
Selector that computes the median of the values. |
|
Selector that returns the best value. |
|
|
Selector that returns the best value from the k values with highest confidence. |
|
Selector that returns the best value from k randomly chosen values. |
Analysis functions#
Underlying functions used be the Metric
classes to compute the metric values,
that are not directly implemented biotite.structure
.
|
Compute the Binding-Site Superposed, Symmetry-Corrected Pose RMSD (BiSyRMSD) for the given PLI complex as defined in [R50064b0d173e-1]. |
|
Find atom clashes in the given structure. |
|
Compute DockQ for a single pair of receptor and ligand in both, the pose and reference structure. |
|
Compute the pocket-aligned RMSD part of the DockQ score for small molecules. |
|
Compute the ligand RMSD part of the DockQ score. |
|
Compute the interface RMSD part of the DockQ score. |
|
Compute the fnat and fnonnat part of the DockQ score. |
|
Result of a DockQ calculation. |
Miscellaneous#
This warning is raised, if a the |
|
This warning is raised, if the RDKit based molecule matching fails. |
|
This warning is raised, if a |
|
|
Fix small issues with RDKit SanitizeMol and sanitize molecule. |
|
Standardize the given system. |
|
Find the optimal atom order for each pose that minimizes the RMSD to the reference. |
|
Check whether the given chain is a small molecule. |
|
Get a set of tuples containing the residue IDs for each contact between receptor and ligand. |