peppr.find_optimal_match#
- peppr.find_optimal_match(reference: AtomArray, pose: AtomArray, min_sequence_identity: float = 0.95, use_heuristic: bool = True, max_matches: int | None = None, allow_unmatched_entities: bool = False, use_entity_annotation: bool = False) tuple[AtomArray, AtomArray][source]#
- Match the atoms from the given reference and pose structure so that the RMSD between them is minimized. - ‘Matching’ has two effects here: - Chains and atoms within each residue that have a counterpart in the respective other structure, are reordered if necessary so that they are in the same order. 
- A - matchedannotation is added, which is- Falsefor all atoms, that do not have a counterpart.
 - Parameters:
- referenceAtomArray, shape=(p,)
- The reference structure. 
- poseAtomArray, shape=(q,)
- The pose structure. 
- min_sequence_identityfloat, optional
- The minimum sequence identity between two chains to be considered the same entity. 
- use_heuristicbool or int, optional
- Whether to employ a fast heuristic [1] to find the optimal chain permutation. This heuristic represents each chain by its centroid, i.e. instead of exhaustively superimposing all atoms for each permutation, only the centroids are superimposed and the closest match between the reference and pose is selected. 
- max_matchesint, optional
- The maximum number of atom mappings to try, if the use_heuristic is set to - False.
- allow_unmatched_entitiesbool, optional
- If set to - True, allow entire entities to be unmatched. This is useful if a pose is compared to a reference which may contain different molecules.
- use_entity_annotationbool, optional
- If set to - True, use the- entity_idannotation to determine which chains are the same entity and therefore are mappable to each other. By default, the entity is determined from sequence identity for polymers and residue name for small molecules.
 
- Returns:
- matched_reference, matched_poseAtomArray, shape=(p,) or (q,)
- The input atoms, where the chains and atoms within each residue are brought into the corresponding order. Atoms that are matched between the reference and the pose are annotated with - matched=True. All other atoms are annotated with- matched=False. This means indexing both structures with- matchedas boolean mask will return structures with the same number of atoms.
 
 - Notes - Atoms that are not matched ( - matched=False), are positioned in the reordered return value as follows:- Unmatched chains are appended to the end. 
- Unmatched residues within a matched chain are kept at their original sequence position. 
- Unmatched atoms within a matched residue are kept at their original position. 
 - Note that the heuristic used by default is much faster compared to the exhaustive approach: Especially for larger complexes with many homomers or small molecule copies, the number of possible mappings combinatorially explodes. However, the heuristic might not find the optimal permutation for all cases, especially in poses that only remotely resemble the reference. - References [1]- Protein complex prediction with AlphaFold-Multimer, Section 7.3, https://doi.org/10.1101/2021.10.04.463034