Python API#

This tutorial demonstrates how to use peppr from within Python, from loading structures to evaluating and reporting metrics.

Loading and preparing structures#

peppr leverages the structure representation from the Biotite package, namely the biotite.structure.AtomArray. Consequently, peppr does not provide a way to load or edit structures by itself: All this is done ‘upstream’ via functionalities from Biotite. Have a look at the corresponding Biotite tutorials for a more in-depth introduction on its capabilities.

For the scope of this tutorial we will load our reference structures and predicted poses from CIF files. The files used in this tutorial are proteolysis targeting chimera (PROTAC) complexes predicted from the PLINDER dataset and can be downloaded here. As input structures for peppr may comprise any number of peptide and nucleotide chains, as well as small molecules, they are called system throughout this package.

import biotite.structure.io.pdbx as pdbx

system_dir = path_to_systems / "7jto__1__1.A_1.D__1.J_1.O"
pdbx_file = pdbx.CIFFile.read(system_dir / "reference.cif")
ref = pdbx.get_structure(pdbx_file, model=1, include_bonds=True)
pdbx_file = pdbx.CIFFile.read(system_dir / "poses" / "pose_0.cif")
pose = pdbx.get_structure(pdbx_file, model=1, include_bonds=True)
print(type(ref))
print(type(pose))
<class 'biotite.structure.AtomArray'>
<class 'biotite.structure.AtomArray'>

There are two important things to note here:

  • model=1 ensures that only one reference and one pose is loaded from the CIF file, as the file may also contain multiple models. You can omit it, if you want peppr to evaluate multiple poses per system, as discussed later.

  • include_bonds=True instructs the parser also to load the bonds between atoms. If they are missing peppr will raise an exception when it sees this system. The full list of requirements on input systems is documented in the API reference.

As already indicated above, biotite.structure.AtomArray objects can also be loaded from a variety of other formats, such as PDB and MOL/SDF. Take a look at biotite.structure.io for a comprehensive list.

Evaluation of a metric on a structure#

Now that we have the reference and poses loaded, we can evaluate metrics on them. Each metric is represented by a Metric object.

import peppr

lddt_metric = peppr.GlobalLDDTScore(backbone_only=False)
print("Name of the metric:", lddt_metric.name)
print(
    "Which values are considered better predictions?",
    "Smaller ones." if lddt_metric.smaller_is_better() else "Larger ones."
)
Name of the metric: global all-atom lDDT
Which values are considered better predictions? Larger ones.

The Metric.evaluate() method takes the reference and poses of a system and returns a scalar value.

print(lddt_metric.evaluate(ref, pose))
0.7892155413495742

Some metrics may not be defined for a given system. For example, if like in the current system there is no protein-protein interface, the interface RMSD (iRMSD) is undefined. In such cases, Metric.evaluate() returns NaN.

irmsd_metric = peppr.InterfaceRMSD()
print(irmsd_metric.evaluate(ref, pose))
2.658182382583618

Feeding the evaluator#

Up to now we looked at each prediction in isolation. However, to evaluate a structure prediction model, one usually wants to run evaluations on many systems and eventually aggregate the results. This purpose is fulfilled by the Evaluator class. Given the desired metrics, predictions can be fed into it one after the other. As a bonus, the Evaluator also takes care of finding the corresponding atoms between the reference structure and its poses, in case some atoms are missing in either one, so you do not have to.

# An evaluator focused on metrics for protein-protein and protein-ligand interactions
evaluator = peppr.Evaluator(
    [
        peppr.DockQScore(),
        peppr.ContactFraction(),
        peppr.InterfaceRMSD(),
        # Only defined for protein-protein interactions
        peppr.LigandRMSD(),
        # Only defined for protein-ligand interactions
        peppr.PocketAlignedLigandRMSD(),
    ]
)

for system_dir in sorted(path_to_systems.iterdir()):
    if not system_dir.is_dir():
        continue
    system_id = system_dir.name
    pdbx_file = pdbx.CIFFile.read(system_dir / "reference.cif")
    ref = pdbx.get_structure(pdbx_file, model=1, include_bonds=True)
    pdbx_file = pdbx.CIFFile.read(system_dir / "poses" / "pose_0.cif")
    pose = pdbx.get_structure(pdbx_file, model=1, include_bonds=True)
    evaluator.feed(system_id, ref, pose)

Eventually we can report the evaluation run as a table that lists every chosen metric for each system (indicated by the given system ID).

evaluator.tabulate_metrics()
DockQ fnat iRMSD LRMSD PLI-LRMSD
7jto__1__1.A_1.D__1.J_1.O 0.54 0.75 2.66 16.82 9.36
7znt__2__1.F_1.G__1.J 0.74 0.53 1.89 8.09 3.27
8bdt__1__1.A_1.D__1.I 0.59 0.41 2.41 6.92 5.97

As mentioned above, you probably also want to aggregate the results over the systems. This can be done by calling Evaluator.summarize_metrics() which gives a dictionary mapping the metric names to their aggregated values.

for name, value in evaluator.summarize_metrics().items():
    print(f"{name}: {value:.2f}")
DockQ incorrect: 0.00
DockQ acceptable: 0.00
DockQ medium: 1.00
DockQ high: 0.00
DockQ mean: 0.62
DockQ median: 0.59
fnat mean: 0.56
fnat median: 0.53
iRMSD mean: 2.32
iRMSD median: 2.41
LRMSD mean: 10.61
LRMSD median: 8.09
PLI-LRMSD mean: 6.20
PLI-LRMSD median: 5.97

As you see for each metric the mean and median value are reported. You might also notice that in case of the DockQScore also incorrect, acceptable, medium and high percentages are reported. For example, the value for DockQ acceptable reads as

A fraction of <x> of all applicable systems are within the acceptable threshold.

where the thresholds are defined in the thresholds attribute of some Metric classes such as DockQScore.

for bin, lower_threshold in peppr.DockQScore().thresholds.items():
    print(f"{bin}: {lower_threshold:.2f}")
incorrect: 0.00
acceptable: 0.23
medium: 0.49
high: 0.80

Selecting the right pose#

Until now we fed only a single pose per system to the Evaluator. However, many structure prediction models return a number of poses, potentially ranked by their predicted confidence. To reflect this, the Evaluator accepts multiple poses per system - either via passing a list of biotite.structure.AtomArray objects or an biotite.structure.AtomArrayStack, which represents multiple poses with the same atoms.

system_dir = path_to_systems / "7jto__1__1.A_1.D__1.J_1.O"
pdbx_file = pdbx.CIFFile.read(system_dir / "reference.cif")
reference = pdbx.get_structure(pdbx_file, model=1, include_bonds=True)
poses = []
for pose_path in sorted(system_dir.glob("poses/*.cif")):
    pdbx_file = pdbx.CIFFile.read(pose_path)
    poses.append(pdbx.get_structure(pdbx_file, model=1, include_bonds=True))

But for each metric only one value is finally reported, so how does the Evaluator choose a single result for multiple poses? The answer is, using the current setup the Evaluator cannot, you will see an exception instead.

evaluator = peppr.Evaluator([peppr.DockQScore()])
evaluator.feed("foo", reference, poses)
evaluator.tabulate_metrics()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[11], line 3
      1 evaluator = peppr.Evaluator([peppr.DockQScore()])
      2 evaluator.feed("foo", reference, poses)
----> 3 evaluator.tabulate_metrics()

File /opt/hostedtoolcache/Python/3.11.12/x64/lib/python3.11/site-packages/peppr/evaluator.py:237, in Evaluator.tabulate_metrics(self, selectors)
    189 def tabulate_metrics(
    190     self, selectors: Iterable[Selector] | None = None
    191 ) -> pd.DataFrame:
    192     """
    193     Create a table listing the value for each metric and system.
    194 
   (...)
    235     8jmr__1__1.A_1.B__1.C_1.D             5.058327       0.730324           0.868684
    236     """
--> 237     columns = self._tabulate_metrics(selectors)
    238     # Convert (metric, selector)-tuples to strings
    239     columns = {
    240         (
    241             f"{metric.name} ({selector.name})"
   (...)
    245         for (metric, selector), values in columns.items()
    246     }

File /opt/hostedtoolcache/Python/3.11.12/x64/lib/python3.11/site-packages/peppr/evaluator.py:339, in Evaluator._tabulate_metrics(self, selectors)
    337     condensed_values.append(np.nan)
    338 elif len(array) > 1:
--> 339     raise ValueError(
    340         "At least one selector is required for multi-pose predictions"
    341     )
    342 else:
    343     condensed_values.append(array[0])

ValueError: At least one selector is required for multi-pose predictions

The Evaluator needs a way to select the desired value from the metric evaluated on each pose. What is desired depends on the use case: Do we only want the value for the most confident prediction or the best prediction or the average of all predictions? Hence, the user needs to supply one or multiple Selector objects to the Evaluator.tabulate() and Evaluator.summarize() methods. For example the TopSelector selects the best value from the k most confident predictions.

Note

Evaluator.feed() expects that poses are sorted from highest to lowest confidence, i.e the first pose in the list is the most confident prediction.

evaluator.tabulate_metrics(selectors=[peppr.TopSelector(3)])
DockQ (Top3)
foo 0.59

Note the added (Top3) in the column name: For every given selector a column will appear that shows the selected value for that Selector.

evaluator.tabulate_metrics(selectors=[peppr.TopSelector(3), peppr.MedianSelector()])
DockQ (Top3) DockQ (median)
foo 0.59 0.29