Custom metrics and selectors#

peppr is written with extensibility in mind: You can easily define your own Metric and Selector classes to extend the functionality of the package.

Creating a custom metric#

To create a custom metric, one only needs to subclass the Metric base class. For the scope of this tutorial we will create an all-atom RMSD as metric.

import peppr
import numpy as np
import biotite.structure as struc

class RMSD(peppr.Metric):

    # This is mandatory
    @property
    def name(self):
        return "My awesome RMSD"

    # This optionally enables binning for `Evaluator.summarize_metrics()`
    @property
    def thresholds(self):
        return {"good": 0.0, "ok": 10.0, "bad": 20.0}

    # The core of each metric
    def evaluate(self, reference, pose):
        # It is not guaranteed that `reference` and `pose` are superimposed
        pose, _ = struc.superimpose(reference, pose)
        return struc.rmsd(reference, pose).item()

    # Defines whether a smaller or larger value indicates a better prediction
    # Used by the `Selector` classes
    def smaller_is_better(self):
        return True

A few notes on the evaluate() method: The purpose of a Metric is to be used in context of an Evaluator. As the Evaluator already automatically preprocesses the input systems passed to Evaluator.feed(), we can rely on some properties of each system that make our life easier:

The input system will only contain heavy atoms (i.e. no hydrogen).
The input system always contains matching atoms. This means that atom reference[i] corresponds to the atom pose[i]. Finding the optimal atom mapping in case of ambiguous scenarios, as they appear in homomers or symmetric molecules, is already handled under the hood beforehand.
The Metric should respect the hetero annotation of the input biotite.structure.AtomArray. An atom where hetero is False is defined as polymer and where hetero is True as small molecule. For example a peptide chain with hetero being True should still be interpreted as a small molecule.
Atoms with the same chain_id should be considered as part of the same molecule, atoms with different chain_id should be considered as different molecules.

Creating a custom selector#

Custom selectors can also be created by inheriting from the Selector base class. Let’s create a selector that picks the worst case prediction, i.e. the pose with the worst value for a given metric.

class WorstCaseSelector(peppr.Selector):

    @property
    def name(self):
        return "worst case"

    def select(self, values, smaller_is_better):
        if smaller_is_better:
            return np.nanmax(values)
        else:
            return np.nanmin(values)

Note that the Selector.select() method does not know anything about the actual pose or metric it is selecting from. It only obtains values to select from (may also contain NaN values) and their ordering.

Bringing it all together#

Now that we have created our custom metric and selector, we can use them in the Evaluator to evaluate our predictions.

import biotite.structure.io.pdbx as pdbx

evaluator = peppr.Evaluator([RMSD()])

for system_dir in sorted(path_to_systems.iterdir()):
    if not system_dir.is_dir():
        continue
    system_id = system_dir.name
    pdbx_file = pdbx.CIFFile.read(system_dir / "reference.cif")
    reference = pdbx.get_structure(pdbx_file, model=1, include_bonds=True)
    poses = []
    for pose_path in sorted(system_dir.glob("poses/*.cif")):
        pdbx_file = pdbx.CIFFile.read(pose_path)
        poses.append(pdbx.get_structure(pdbx_file, model=1, include_bonds=True))
    evaluator.feed(system_id, reference, poses)

evaluator.tabulate_metrics(selectors=[WorstCaseSelector()])

	My awesome RMSD (worst case)
7jto__1__1.A_1.D__1.J_1.O	9.98
7znt__2__1.F_1.G__1.J	6.46
8bdt__1__1.A_1.D__1.I	9.01

And finally, we will also the bin thresholds for the RMSD metric in action.

for name, value in evaluator.summarize_metrics(
    selectors=[WorstCaseSelector()]
).items():
    print(f"{name}: {value:.2f}")

My awesome RMSD good (worst case): 1.00
My awesome RMSD ok (worst case): 0.00
My awesome RMSD bad (worst case): 0.00
My awesome RMSD mean (worst case): 8.48
My awesome RMSD median (worst case): 9.01

Custom metrics and selectors#

Creating a custom metric#

Creating a custom selector#

Bringing it all together#

This Page