Introduction: CRISPR–Cas systems have transformed genome editing, yet in vivo clinic performance remains hampered by variable on-target efficacy, unexpected repair results, and locus-, cell type-, and delivery context-dependent off-target liabilities. Traditional design heuristics—PAM rules, simple mismatch penalties, or motif-based properties—can only describe a minority of the biology at hand and fail to generalize. Large language models (LLMs) trained on biological sequences offer a principled option: by learning context-rich representation from large corpora of DNA, RNA, and protein, they are representations of higher-order dependencies that account for editing efficiency and specificity. New genomic LMs improve the prediction of gRNA activity; paired-sequence transformers improve off-target risk prediction; outcome models predict indel spectra; and customized frameworks predict base- and prime-editing product profiles. In parallel, protein language models enable data-driven Cas variant design with precision-curated PAM scopes and fidelity, multimodal models integrating chromatin accessibility and epigenetic state for context-dependent prediction, and beyond individual predictors, "agentic" LLM systecomparesms orchestrate end-to-end workflows—tool selection, sequence design, protocol composition, and validation planning—to enable reproducible, higher-throughput pipelines. This.review incorporates these advances, datasets and benchmarks, addresses interpretability and safety concerns, and delineates an achievable roadmap for co-optimizing efficacy and specificity in the case of nucleases, base editors, and prime editors.
Methods: Methods—We conducted a scoping review (2019–2025) on LLMs improving CRISPR efficacy/specificity. Sources: PubMed, Web of Science, arXiv, bioRxiv. Searches combined CRISPR/Cas, base/prime editing, off-target, genomic language model. Inclusion: quantitative evaluations, datasets/benchmarks. We identified tasks, datasets, metrics (AUROC, PR-AUC, calibration), model types, and reproducibility artifacts. Bias evaluation encompassed leakage and validation.
Results: Large language models (LLMs) are revolutionizing CRISPR–Cas design through the replacement of heuristic, feature-engineered scoring with learned DNA, RNA, and protein representations. Genomic language models (gLMs) pretrained on k-mer corpora (e.g., DNABERT, Nucleotide Transformer) offer task-agnostic embeddings that transfer to on-target activity prediction, improving cross-cell-line generalization and probability calibration over shallow baselines. To this, OT risk transformer architectures categorize sgRNA–site pairs under sequence-pair classification, maintaining bulges, mismatch tolerance, and local context; new RNA-aware models more subtly discriminates bona fide OT loci. As editing efficacy finally rests on repair, modern outcome predictors go beyond binary cut/no-cut to predict indel spectra and frameshift probability, enabling "repair-aware" guide prioritization. For base editors, attention-based methods such as BE-DICT and system-identification approaches such as BE-Hive predict efficiency and product distributions in windows and bystander environments; for prime editing, transfer-learning methods with deep optimization design pegRNA structure (PBS/RT template/nicking strategy) outperform rule-based software and shorten the design–build–test cycle. Precise insertions via HDR are facilitated by learning-guided donor design that optimizes homology arms and junction context to enhance knock-in consistency.
At the protein level, language models enable generative exploration of Cas effectors, offering avenues to altered PAM compatibilities, enhanced fidelity, and reduced molecular size, thereby eliminating practical constraints on delivery and specificity. Multimodal models integrate chromatin accessibility, nucleosome occupancy, and epigenetic modifications with sequence embeddings to capture state-dependent accessibility, improving transferability across tissues and primary cells. Aside from sole predictors, "agentic" LLM platforms orchestrate end-to-end workflows—modality selection (nuclease/BE/PE), guide and donor/pegRNA design, protocol optimization, and validation plan—standardizing outputs and scaling expert design in the process. These developments depend on high-quality data and readouts: standardized benchmarks for on/off-target and outcome; genome-wide assays (GUIDE-seq, CIRCLE-seq, DISCOVER-Seq) for orthogonal ground truth; and novel cleavage/repair profiling technologies enabling external validation. Interpretability (attribution across tokens/modalities), calibrated uncertainty, and future validation are coming to be seen as necessities for translational claims.
Open problems include systematic cross-editor generalization (Cas9→Cas12/13 and other editors), co-design of guide/pegRNA/donor simultaneously under delivery and manufacturability constraints, explicit representation of repair pathway utilization and cell-state dynamics, and active-learning loops that tightly couple mini-library experimentation with model update. A plan of practical sense: begin with on-target scoring using gLM; screen with OT prediction using LLM; incorporate chromatin priors; employ modality-specific outcome models (inDelphi-class, BE-DICT/BE-Hive, PE efficiency estimators) for optimal functional success; validate using agnostic genome-wide assays; and iterate with active learning with open reportable and model cards. These elements together are an ordered, evidence-based path to quantifiable efficacy and specificity gains.
Conclusion: LLMs are shifting CRISPR design from part-wise heuristic optimization to a data-driven, systematic framework. Sequence foundation models, off-target transformers, repair-outcome predictors, and prime/base-editing estimators collectively enable rational guide and editor selection; protein LMs expand the effector repertoire; and multimodal integration grounds predictions in the relevant chromatin context. To reproduce these achievements consistently, the area must prioritize the highest on unbiased genome-wide readouts, robust external validation, calibrated uncertainty, and transparent reporting by shared benchmarks and model cards. Open challenges—cross-editor generalization, joint design of guide/pegRNA/donor subject to delivery constraints, dynamic modeling of repair pathways, and active-learning loops closing the experiment–model loop—are now tractable with standardized data and tooling. A practical pipeline—gLM-initialized on-target modeling, LLM-based off-target screening, chromatin-aware refinement, modality-specific outcome prediction, orthogonal validation, and iterative learning—traces a clear route to measurable gains in efficacy and specificity for preclinical and translational programs.