\section*{A.1.6. Geometric Attention}
As mentioned in Appendix A.1.5.1, ESM3 processes structural information in two independent ways:
Geometric Attention Described in Algorithm 6, this leverages fine-grained 3D information via conditioning on exact coordinates. We find that conditioning on coordinates is critical to good inverse folding performance. Coordinates are only used as model inputs.
Structure Tokens Described in Appendix A.1.7, structure tokens enable faster learning due to rich local neighborhood semantics being compressed into tokens. Structure tokens are generally used as model outputs.
Geometric attention enables high-throughput encoding of protein structures. Protein backbone structure can be represented by the relative distance and orientation of frames defined by each residue's backbone coordinates. Reasoning over the relative orientation of frames is important to capture the local backbone geometry when only partial structure is provided. Geometric attention is an $S E(3)$ invariant allto-all attention mechanism which reasons over the relative distances and orientations of all defined frames in the input (Fig. S2). Because this attention operation can be realized using the same computational primitives as attention, it is readily scalable.
We first provide an overview of frames, and then describe how geometric attention uses them:
Please explain this in detail to a beginning grad student in machine learning.
Response: Certainly! Let's delve into the concept of geometric attention as used in ESM3, and explain it step by step. We'll start from the basics and build up to the detailed workings of geometric attention.
ESM3 is a protein language model designed to understand and generate protein sequences by integrating both the sequence (the order of amino acids) and structural information (how the protein folds in 3D space). In computational biology, incorporating structural information is crucial because the function of a protein is highly dependent on its 3D shape.
Geometric attention is a mechanism in ESM3 that allows the model to reason about the 3D structure of proteins using attention mechanisms that are invariant under rotations and translations in 3D space (SE(3) invariance).
Proteins are biological molecules made up of chains of amino acids. These chains fold into specific three-dimensional shapes, which are essential for their function. The structure of a protein is typically described at four levels:
Understanding protein structures is important because the function of a protein is closely tied to its shape. Changes in the amino acid sequence can lead to different folding patterns and, consequently, different functions.
Traditional machine learning models for proteins have often focused on the amino acid sequence alone. However, to capture the full picture, it's beneficial to incorporate structural information. Attention mechanisms in models like transformers allow for interactions between all parts of the input data, making them a good candidate for modeling proteins.
Geometric attention extends the standard attention mechanism to incorporate geometric information—specifically, the relative positions and orientations of amino acids in 3D space. This is achieved by representing each amino acid (residue) with a local coordinate frame, and defining attention mechanisms that are invariant under rotations and translations (SE(3) invariance).
A frame in 3D space is a coordinate system consisting of an origin and axes defining orientation. In the context of proteins:
Position (( \mathbf{p}_i )): The position vector of the Cα atom of residue ( i ).
Orientation (( \mathbf{R}_i )): A rotation matrix representing the orientation of residue ( i ). This can be constructed using the positions of the N, Cα, and C atoms:
Construct local axes:
Combine axes into orientation matrix:
[ \mathbf{R}i = [\mathbf{a}1, \mathbf{a}2, \mathbf{a}3] ]
Using frames allows us to:
For each residue ( i ), we have:
For each pair of residues ( (i, j) ), we compute:
Relative Position (( \mathbf{d}_{ij} )):
Relative Orientation (( \mathbf{Q}_{ij} )):
Represent Relative Position and Orientation:
Relative Position Encoding: Use features derived from ( \mathbf{d}_{ij} ), such as:
Relative Orientation Encoding: Extract features from ( \mathbf{Q}_{ij} ), such as:
Feature Vectors (( \mathbf{f}_{ij} )):
Attention Mechanism Overview:
Modified Attention with Geometry:
For residue ( i ):
For residue ( j ):
Incorporate Geometric Features:
Attention Score (( \alpha_{ij} )):
[ \alpha{ij} = \frac{ \exp( \mathbf{q}i^\top \mathbf{k}{ij} ) }{ \sum{k} \exp( \mathbf{q}i^\top \mathbf{k}{ik} ) } ]
This ensures that the attention weights between residues depend not only on their sequence embeddings but also on their geometric relationships.
Using Relative Measures: By computing features based on relative positions and orientations, any global rotation or translation cancels out.
Invariant Feature Functions:
Contextualized Embeddings:
[ \mathbf{h}i = \sum{j} \alpha{ij} \mathbf{v}j ]
where ( \mathbf{v}_j ) is the value vector for residue ( j ), possibly including geometric information.
Captures Fine-Grained 3D Relationships: By considering both distance and orientation, the model understands how residues are arranged in space.
Handles Partial Structures: Even if only part of the protein structure is known, geometric attention can make use of the available frames to infer structural relationships.
Improves Inverse Folding Performance: Incorporating exact coordinates helps the model predict sequences that would fold into the given structure more accurately.
Scalable Computations: Using attention mechanisms allows the approach to scale to large proteins efficiently.
Imagine each residue is a person in a social network:
Geometric Attention:
When considering whom to talk to, a person (residue) considers:
The attention mechanism incorporates both social (sequence) and spatial (structural) information.
Geometric attention in ESM3 allows the model to incorporate detailed 3D structural information of proteins by:
This approach enhances the model's ability to perform tasks that depend on precise structural understanding, such as inverse protein folding.
To deepen your understanding, you might explore:
If you have any questions or need clarification on any part of this explanation, feel free to ask! Understanding these concepts takes time, and exploring them thoroughly is an excellent step in your graduate studies in machine learning.