Custom metrics and selectors#
peppr
is written with extensibility in mind:
You can easily define your own Metric
and Selector
classes
to extend the functionality of the package.
Creating a custom metric#
To create a custom metric, one only needs to subclass the Metric
base class.
For the scope of this tutorial we will create an all-atom RMSD as metric.
import peppr
import numpy as np
import biotite.structure as struc
class RMSD(peppr.Metric):
# This is mandatory
@property
def name(self):
return "My awesome RMSD"
# This optionally enables binning for `Evaluator.summarize_metrics()`
@property
def thresholds(self):
return {"good": 0.0, "ok": 10.0, "bad": 20.0}
# The core of each metric
def evaluate(self, reference, pose):
# Limit system to common atoms between reference and pose
not_nan_mask = ~np.isnan(reference.coord[:, 0]) & ~np.isnan(pose.coord[:, 0])
# It is not guaranteed that `reference` and `pose` are superimposed
reference = reference[not_nan_mask]
pose = pose[not_nan_mask]
pose, _ = struc.superimpose(reference, pose)
return struc.rmsd(reference, pose).item()
# Defines whether a smaller or larger value indicates a better prediction
# Used by the `Selector` classes
def smaller_is_better(self):
return True
A few notes on the evaluate()
method:
The purpose of a Metric
is to be used in context of an Evaluator
.
As the Evaluator
already automatically preprocesses the input systems passed
to Evaluator.feed()
, we can rely on some properties of each system that make
our life easier:
The input system will only contain heavy atoms (i.e. no hydrogen).
The input system always contains matching atoms. This means that atom
reference[i]
corresponds to the atompose[i]
. Finding the optimal atom mapping in case of ambiguous scenarios, as they appear in homomers or symmetric molecules, is already handled under the hood beforehand.If either the pose or the reference have atoms that are not present in the other, the respective coordinates of these atoms are NaN.
The
Metric
should respect thehetero
annotation of the inputbiotite.structure.AtomArray
. An atom wherehetero
isFalse
is defined as polymer and wherehetero
isTrue
as small molecule. For example a peptide chain withhetero
beingTrue
should still be interpreted as a small molecule.Atoms with the same
chain_id
should be considered as part of the same molecule, atoms with differentchain_id
should be considered as different molecules.
Creating a custom selector#
Custom selectors can also be created by inheriting from the Selector
base
class.
Let’s create a selector that picks the worst case prediction, i.e. the pose
with the worst value for a given metric.
class WorstCaseSelector(peppr.Selector):
@property
def name(self):
return "worst case"
def select(self, values, smaller_is_better):
if smaller_is_better:
return np.nanmax(values)
else:
return np.nanmin(values)
Note that the Selector.select()
method does not know anything about the actual
pose or metric it is selecting from.
It only obtains values to select from (may also contain NaN values) and their
ordering.
Bringing it all together#
Now that we have created our custom metric and selector, we can use them in the
Evaluator
to evaluate our predictions.
import biotite.structure.io.pdbx as pdbx
evaluator = peppr.Evaluator([RMSD()])
for system_dir in sorted(path_to_systems.iterdir()):
if not system_dir.is_dir():
continue
system_id = system_dir.name
pdbx_file = pdbx.CIFFile.read(system_dir / "reference.cif")
reference = pdbx.get_structure(pdbx_file, model=1, include_bonds=True)
poses = []
for pose_path in sorted(system_dir.glob("poses/*.cif")):
pdbx_file = pdbx.CIFFile.read(pose_path)
poses.append(pdbx.get_structure(pdbx_file, model=1, include_bonds=True))
evaluator.feed(system_id, reference, poses)
evaluator.tabulate_metrics(selectors=[WorstCaseSelector()])
My awesome RMSD (worst case) | |
---|---|
7jto__1__1.A_1.D__1.J_1.O | 9.98 |
7znt__2__1.F_1.G__1.J | 6.46 |
8bdt__1__1.A_1.D__1.I | 9.01 |
And finally, we will also the bin thresholds for the RMSD metric in action.
for name, value in evaluator.summarize_metrics(
selectors=[WorstCaseSelector()]
).items():
print(f"{name}: {value:.2f}")
My awesome RMSD good (worst case): 1.00
My awesome RMSD ok (worst case): 0.00
My awesome RMSD bad (worst case): 0.00
My awesome RMSD mean (worst case): 8.48
My awesome RMSD median (worst case): 9.01