peppr.find_optimal_match#

peppr.find_optimal_match(reference: AtomArray, pose: AtomArray, min_sequence_identity: float = 0.95, use_heuristic: bool = True, max_matches: int | None = None) tuple[ndarray[tuple[Any, ...], dtype[int64]], ndarray[tuple[Any, ...], dtype[int64]]][source]#

Find the atom indices for the given reference and pose structure that brings these structure into a corresponding order that minimizes the RMSD between them.

Parameters:
referenceAtomArray, shape=(p,)

The reference structure.

poseAtomArray, shape=(q,)

The pose structure.

min_sequence_identityfloat

The minimum sequence identity between two chains to be considered the same entity.

use_heuristicbool or int

Whether to employ a fast heuristic [1] to find the optimal chain permutation. This heuristic represents each chain by its centroid, i.e. instead of exhaustively superimposing all atoms for each permutation, only the centroids are superimposed and the closest match between the reference and pose is selected.

max_matchesint, optional

The maximum number of atom mappings to try, if the use_heuristic is set to False.

Returns:
reference_indicesnp.array, shape=(n,), dtype=int

The atom indices that should be applied to reference.

pose_indicesnp.array, shape=(n,), dtype=int

The atom indices that should be applied to pose.

Notes

Note that the heuristic used by default is much faster compared to the exhaustive approach: Especially for larger complexes with many homomers or small molecule copies, the number of possible mappings combinatorially explodes. However, the heuristic might not find the optimal permutation for all cases, especially in poses that only remotely resemble the reference.

References

[1]

Protein complex prediction with AlphaFold-Multimer, Section 7.3, https://doi.org/10.1101/2021.10.04.463034