ESM (Evolutionary Scale Modeling) is Meta's family of protein language models that leverage transformer architectures to learn evolutionary patterns from protein sequences. These models have emerged as powerful tools for understanding protein structure, function, and evolution, with significant applications in neurodegenerative disease research. By training on millions of protein sequences from diverse organisms, ESM captures the evolutionary constraints that shape protein architecture and function, enabling predictions that were previously impossible without extensive experimental characterization.
ESM represents a paradigm shift in computational biology, moving from traditional sequence alignment methods to deep learning approaches that capture evolutionary information encoded in protein sequences. The model was developed by Meta AI (formerly Facebook AI Research) and first released in 2019, with subsequent versions (ESM-1b, ESM-2) demonstrating increasingly powerful capabilities. Unlike sequence alignment methods that rely on pairwise comparisons, ESM learns rich contextual representations that encode evolutionary relationships, structural constraints, and functional annotations simultaneously.
The fundamental innovation of ESM lies in its ability to learn from the "evolutionary experiment" that nature has performed over billions of years. By training on the vast corpus of naturally occurring protein sequences, the model learns which amino acid substitutions are tolerated, which positions are structurally important, and which residues participate in functional interactions. This learned knowledge can then be applied to predict the effects of disease-causing mutations, generate novel protein designs, and identify therapeutic targets.
| Version |
Release Year |
Parameters |
Key Improvements |
Context Length |
| ESM-1 |
2019 |
670M |
First transformer for proteins |
1022 |
| ESM-1b |
2020 |
420M |
Optimized architecture |
1022 |
| ESM-2 |
2022 |
15B |
Scale, zero-shot capabilities |
4096 |
| ESM-2 (large) |
2023 |
35B |
Maximum zero-shot performance |
4096 |
ESM employs a transformer architecture specifically designed for protein sequences[@rives2021]. Unlike natural language transformers that process word sequences, ESM processes amino acid sequences — treating each residue as a "word" in a biological language:
Attention Mechanisms
- Multi-head attention captures long-range evolutionary dependencies between amino acid residues
- Different attention heads learn different aspects of protein biology (structure, function, evolution)
- Attention maps can be visualized to identify functional domains and interaction interfaces
Masked Language Modeling
- Pre-trained using masked token prediction to learn residue-level patterns
- The model learns to predict masked amino acids based on their context
- This self-supervised objective forces the model to learn comprehensive protein representations
Position Encodings
- Modified position encodings handle the discrete nature of amino acid sequences
- Relative position encodings capture local and global sequence context
- Circular encodings can capture the modular nature of protein domains
Evolutionary Training
- Trained on millions of protein sequences from UniRef90, UniProt, and other databases
- Training data includes sequences from all domains of life
- This diverse training enables the model to learn general principles of protein evolution
- Residue-level attention: Each attention head can capture different evolutionary constraints, from local secondary structure to long-range domain interactions
- Multiple sequence alignment (MSA) integration: Some variants incorporate MSA information to enrich evolutionary context
- Contact prediction: Attention maps can predict residue-residue contacts useful for structure prediction[@waldisphl2022]
- Zero-shot learning: Pre-trained models can generalize to new protein families without fine-tuning[@lin2023]
- Per-residue embeddings: Each amino acid position receives a contextualized embedding vector (1280-dimensional for ESM-2)
- MSA sampling: For some applications, multiple sequence alignments are sampled to provide evolutionary context
ESM generates high-dimensional embeddings (1280-dimensional for ESM-2) that capture[@brandes2022]:
Structural Information
- Secondary structure propensity (alpha-helix, beta-sheet)
- Fold topology and domain organization
- Disordered region identification
- Solvent accessibility predictions
Functional Annotations
- Enzyme commission numbers
- Gene ontology (GO) terms
- Signal peptide and transmembrane predictions
- Post-translational modification sites
Evolutionary Conservation
- Conservation scores at each position
- Mutational tolerance profiles
- Evolutionary constraints on functional residues
- Phylogenetic relationships
These embeddings can be used for:
- Clustering proteins by structure/function similarity
- Identifying homologous relationships
- Feature extraction for downstream machine learning tasks
- Protein-protein interaction prediction
- Drug target identification
ESM has proven valuable for predicting the effects of genetic variants[@liu2023]:
Variant Pathogenicity Scoring
- ESM embeddings can distinguish pathogenic from benign variants
- Disease-associated mutations often disrupt evolutionary patterns learned by the model
- Embedding distances correlate with functional impact
Missense Variant Analysis
- Particularly useful for interpreting variants of uncertain significance (VUS)
- Can predict whether a variant will disrupt protein structure/function
- Helps prioritize variants for experimental characterization
Evolutionary Constraint
- Models learn which residues are evolutionarily conserved
- Identifies critical functional regions that cannot tolerate changes
- Highlights positions under purifying selection
ESM has numerous applications in Alzheimer's disease (AD) research:
| Protein |
ESM Application |
Disease Relevance |
| APP |
Mutation effect prediction, proteolytic cleavage modeling |
Amyloid precursor protein processing |
| Tau (MAPT) |
Isoform-specific embedding analysis |
Tau aggregation and propagation |
| APOE |
Variant effect on protein structure |
AD risk factor |
| Amyloid-beta (Aβ) |
Aggregation propensity prediction |
Amyloid plaque formation |
| TREM2 |
Variant pathogenicity analysis |
Microglial response to Aβ |
| BACE1 |
Inhibitor design targets |
Beta-secretase drug discovery |
Specific Applications:
- Tau (MAPT) isoforms: ESM embeddings distinguish between the six tau isoforms expressed in the human brain, helping identify isoform-specific pathological mechanisms
- APOE variants: The ε4 allele represents the strongest genetic risk factor for late-onset AD; ESM helps predict how APOE4 disrupts lipid transport and Aβ clearance
- TREM2 variants: Variants like R47H impair microglial phagocytosis of Aβ; ESM predicts variant effects on TREM2 structure and signaling
- Aβ aggregation: ESM predicts which Aβ sequences are more likely to form toxic oligomers
ESM applications in Parkinson's disease (PD) include:
| Protein |
ESM Application |
Disease Relevance |
| alpha-synuclein |
Aggregation prediction, mutation effects |
Lewy body formation |
| LRRK2 |
Kinase domain variant analysis |
PD risk gene |
| GBA |
Glucocerebrosidase variant classification |
Gaucher disease/Parkinsonism |
| PRKN (Parkin) |
E3 ubiquitin ligase domain variants |
Mitophagy |
| PINK1 |
Kinase domain mutation effects |
Mitophagy |
| ATP13A2 |
Lysosomal function predictions |
Juvenile parkinsonism |
Specific Applications:
- alpha-synuclein (SNCA): ESM predicts how mutations (A53T, A30P, E46K) affect aggregation propensity and membrane binding
- LRRK2: Over 100 LRRK2 variants have been identified; ESM helps classify pathogenic vs. benign variants
- GBA variants: Glucocerebrosidase variants increase Parkinsonism risk; ESM predicts variant effects on lysosomal function
- PINK1/Parkin: Critical for mitophagy; ESM identifies variants disrupting this pathway
Amyotrophic lateral sclerosis (ALS) and frontotemporal dementia (FTD) share common molecular mechanisms, and ESM has proven valuable for studying both:
| Protein |
ESM Application |
Disease Relevance |
| C9orf72 |
Hexanucleotide repeat effect modeling |
Most common genetic cause |
| TDP-43 (TARDBP) |
RNA binding protein aggregation |
ALS/FTD pathology |
| SOD1 |
Amyotrophic lateral sclerosis variants |
Familial ALS |
| FUS |
RNA binding, phase separation |
ALS-associated |
| C9orf72 |
Repeat expansion toxicity modeling |
Dipeptide repeat proteins |
Specific Applications:
- C9orf72 hexanucleotide repeat: The most common genetic cause of both familial ALS and FTD; ESM models how the repeat expansion leads to toxic gain-of-function
- TDP-43 pathology: Found in 97% of ALS cases and 50% of FTD cases; ESM predicts aggregation-prone sequences
- SOD1 mutations: Over 150 ALS-associated SOD1 variants; ESM classifies pathogenicity and predicts aggregation
- FUS proteinopathy: FUS mutations cause aggressive ALS; ESM predicts effects on nuclear localization and phase separation
ESM and AlphaFold are highly complementary tools in computational biology[@burkhardt2021]:
AlphaFold2/3
- Provides high-accuracy 3D structure predictions for individual proteins
- Requires input sequences only, no template information needed
- Revolutionized structural biology with near-experimental accuracy
ESM
- Provides evolutionary context encoded in sequence patterns
- Can predict structures for proteins without known structures
- Zero-shot capabilities enable predictions without any fine-tuning
Combined Approach
- ESM embeddings can guide AlphaFold modeling by providing evolutionary constraints
- AlphaFold structures can validate ESM predictions about functional residues
- Combined approaches show improved accuracy for difficult targets
Meta developed ESMFold, a structure prediction model based on ESM2[@huang2022]:
- End-to-end structure prediction from sequence alone
- Comparable accuracy to AlphaFold2 for many proteins
- Particularly useful for proteins without known homologs
- Faster inference than iterative AlphaFold predictions
- Uses ESM2 embeddings as the backbone representation
When choosing between ESMFold and AlphaFold2:
| Factor |
ESMFold |
AlphaFold2 |
| Speed |
Faster |
Slower |
| Accuracy (hard targets) |
Better for remote homologs |
Better overall |
| Multiple domain proteins |
May struggle |
Better |
| MSA availability |
Not required |
Optional |
| GPU memory |
Lower requirements |
Higher |
¶ Limitations and Challenges
- Context length: ESM2 has a 4096 token context window, limiting analysis of very long proteins or protein complexes
- Structural resolution: Embeddings capture structural information but not at atomic resolution
- Dynamic states: ESM provides static representations, missing conformational dynamics
- Post-translational modifications: Limited ability to predict effects of phosphorylation, ubiquitination, etc.
- Computational resources: ESM2 requires significant GPU memory for inference
- Batch processing: Large-scale analyses require careful memory management
- Interpretation: Extracting meaningful biological insights from embeddings requires expertise
- Predictions should be validated experimentally where possible
- ESM predictions work best as hypotheses to guide research
- Critical findings should be confirmed with multiple approaches
- ESM Atlas (esmatlas.org): Browse predicted structures for millions of proteins
- Hugging Face Models: Pre-trained ESM2 models available for download
- GitHub Repository: Meta's official ESM implementation
- OpenFold: Open-source reproduction of AlphaFold using ESM embeddings
- ESM-DA: Downstream analysis toolkit for ESM embeddings
- ProteinBERT: Alternative protein language model for comparison
- Variant effect prediction pipelines: Combine ESM with other tools
- Structure visualization: PyMOL, ChimeraX integration
- Machine learning frameworks: PyTorch, TensorFlow compatibility
- Rives A, et al, Biophysical 3D structure from protein language models (2021)
- Lin Z, et al, Protein language models for protein engineering (2023)
- Su J, et al, Large language models in molecular biology (2023)
- Brandes N, et al, Protein function prediction with ESM embeddings (2022)
- Liu S, et al, Mutation effect on protein structure (2023)
- Huang Y, et al, ESM2: Scaling language modeling for proteins (2022)
- Waldispühl J, et al, Predicting protein contacts with ESM (2022)
- Madani A, et al, Large language models for protein engineering (2023)
Related topics:
- Rives A, et al, Biophysical 3D structure from protein language models (2021)
- Lin Z, et al, Protein language models for protein engineering (2023)
- Su J, et al, Large language models in molecular biology (2023)
- Brandes N, et al, Protein function prediction with ESM embeddings (2022)
- Liu S, et al, Mutation effect on protein structure prediction (2023)
- Huang Y, et al, ESM2: Scaling language modeling for proteins (2022)
- Waldispühl J, et al, Predicting protein contacts with ESM (2022)
- Lin Z, et al, Language models of protein sequences at scale (2022)
- Burkhardt Q, et al, Protein structure prediction with ESM (2021)
- Madani A, et al, Large language models for protein engineering (2023)