peppr.find_optimal_match#

peppr.find_optimal_match(reference: AtomArray, pose: AtomArray, min_sequence_identity: float = 0.95, use_heuristic: bool = True, max_matches: int | None = None, allow_unmatched_entities: bool = False) tuple[AtomArray, AtomArray][source]#

Match the atoms from the given reference and pose structure so that the RMSD between them is minimized.

‘Matching’ has two effects here: - Chains and atoms within each residue that have a counterpart in the respective

other structure, are reordered if necessary so that they are in the same order.

  • A matched annotation is added, which is False for all atoms, that do not have a counterpart.

Parameters:
referenceAtomArray, shape=(p,)

The reference structure.

poseAtomArray, shape=(q,)

The pose structure.

min_sequence_identityfloat, optional

The minimum sequence identity between two chains to be considered the same entity.

use_heuristicbool or int, optional

Whether to employ a fast heuristic [1] to find the optimal chain permutation. This heuristic represents each chain by its centroid, i.e. instead of exhaustively superimposing all atoms for each permutation, only the centroids are superimposed and the closest match between the reference and pose is selected.

max_matchesint, optional

The maximum number of atom mappings to try, if the use_heuristic is set to False.

allow_unmatched_entitiesbool, optional

If set to True, allow entire entities to be unmatched. This is useful if a pose is compared to a reference which may contain different molecules.

Returns:
matched_reference, matched_poseAtomArray, shape=(p,) or (q,)

The input atoms, where the chains and atoms within each residue are brought into the corresponding order. Atoms that are matched between the reference and the pose are annotated with matched=True. All other atoms are annotated with matched=False. This means indexing both structures with matched as boolean mask will return structures with the same number of atoms.

Notes

Atoms that are not matched (matched=False), are positioned in the reordered return value as follows: - Unmatched chains are appended to the end. - Unmatched residues within a matched chain are kept at their original sequence

position.

  • Unmatched atoms within a matched residue are kept at their original position.

Note that the heuristic used by default is much faster compared to the exhaustive approach: Especially for larger complexes with many homomers or small molecule copies, the number of possible mappings combinatorially explodes. However, the heuristic might not find the optimal permutation for all cases, especially in poses that only remotely resemble the reference.

References

[1]

Protein complex prediction with AlphaFold-Multimer, Section 7.3, https://doi.org/10.1101/2021.10.04.463034