• Predicting the Structural and Functional Impact of CRISPR-Mediated Mutations: A Review of Computational Strategies for Protein Engineering
  • Mohammad Mehdi Sadehsani,1,* Fatemeh Zahra Shakerian,2 Zahra Zolfagharzadeh,3 Ali Akbar Sahfienejad,4
    1. Department of Cellular and Molecular Biology, Faculty of Basic Science, Sari Branch, Islamic Azad University, Sari, Iran
    2. Department of Cellular and Molecular Biology, Faculty of Basic Science, Sari Branch, Islamic Azad University, Sari, Iran
    3. Faculty of Fundamental Medicine, Lomonosov Moscow State University, Moscow, Russia
    4. Department of Cellular and Molecular Biology, Faculty of Basic Science, Sari Branch, Islamic Azad University, Sari, Iran


  • Introduction: 1. Introduction 1.1 CRISPR-Cas9: The Revolution in Precision Genome Editing The clustered regularly interspaced short palindromic repeats (CRISPR)-associated protein 9 (Cas9) system and its components, the Cas9 enzyme and guide RNA (gRNA), enable the execution of DNA edits with unparalleled precision and efficiency [16,76-79]. The Cas9 protein and its associated gRNA were first reported as part of an adaptive immune system in prokaryotes [7,11]. The gRNA leads Cas9 to a DNA sequence that matches its own, after which the Cas9 protein can cut the sequence using molecular scissors, creating double-stranded breaks (DSBs) at the target site [1,14,21]. The sequence breaks are then repaired by the cell’s repair machinery (NHEJ/HDR), which can create insertions, deletions, or even precise sequence changes [7,14,80]. The magnificence of this technology is in its programmability. By simply changing the sequence of the gRNA, Cas9 can be targeted to nearly any genomic site adjacent to a protospacer adjacent motif (PAM) [15]. The CRISPR-Cas9 system is high in programmability, but another important feature has been its relative ease of use and cost compared to earlier gene editing methods including zinc-finger nucleases (ZFNs) and transcription activator-like effector nucleases (TALENs), that has offered research greater access to CRISPR-Cas9 system [16,81]. CRISPR-Cas9 has been successfully applied across a wide range of organisms from yeast, worms, insects, plants, and mammals (including rodents and primates) [16]. It has been especially effective at generating cellular and animal models for disease research at a rapid pace [3]. For example, in vivo CRISPR-mediated editing in the pancreas has been used to study genes associated with tumor suppressor activity as well as genes involved in metastasis [4] whereas genome-wide CRISPR screens have uncovered vulnerabilities in oral cancer cells and identified genes that contribute to chemotherapy resistance [5]. At an integrative systems level, Cas9-mediated perturbation allows researchers to investigate genome organization, and also gain insight into the cause of relationships between heritable genetic variation and phenotype [6]. CRISPR-Cas9 holds important opportunities for advancing research and for application in medicine. It has the potential to efficiently correct mutation-related disease, restore gene function, and develop gene-based treatment options that are designed for unique genetic profiles [2,8,20]. The potential for clinical application is pushed further after successes in human stem cells that provide a whole new field in regenerative medicine as well as treatment directed at personalized medicine [8]. The application of CRISPR is not limited to human health; in fact agriculture has started using CRISPR to improve crop traits and the nutritional content of foods by optimizing crops to be more resistant to pests or higher yield and nutritional value [7,19]. The CRISPR toolbox is moving forward, and new technologies, including base editors and CRISPR/Cpf1-mediated systems, allow much more precision and scope [9,18]. While CRISPR has developed significantly, a number of issues still exist, such as off-target activity, optimizing efficiency, and the still uncharted territory of bacterial genome editing [9,40-42]. For CRISPR to be able to realise its therapeutic and biotechnological promise, these issues also need to be addressed. 1.2 Genetic Edits and Their Effects on Protein Structure and Function The function of a protein is inherently dependent on its structure, and often relatively small genetic edits, can dramatically change the overall folding, stability or activity of proteins. Proteins evolved with new functionalities by collecting mutations in the amino acid sequence, but the mechanisms that generate new features may result in a loss of integrity of the altering protein. Recent studies using deep mutational scanning suggest that there is often a positional component to structural sensitivity of mutations – in other words, similar substitutions in different contexts could be tolerated by one protein while causing the destabilization of another. [29]. Proteins are fundamental, as they interact dynamically with proteins, small molecules, and cellular structures in mediating biological processes. Although exons that code for proteins represent roughly 1% of the human genome, they are disproportionately important: around 85% of Mendelian diseases are the result of mutations in the exonic space, and proteins are the majority of drug targets [31]. Advances in structural biology (and structural genomics), in the form of crystallography and nuclear magnetic resonance (NMR) are growing the field of structural genomics that maps protein structures that are encoded by genomes at a systematic pace [31]. Further, RNA modifications such as N6-methyladenosine (m6A) , N1-methyladenosine (m1A), and pseudouridylation can also mediate protein expression by altering the stability, structure, and protein-binding functionality of RNA [32]. This pattern of influence across biological processes shows a more general idea that, structural changes at the molecular level in RNA or proteins, can have significant regulatory and functional implications. Apprehending these relationships gives rise to protein engineering, which is the toolbox of structure-function to design or optimize proteins. Processes like directed evolution, structure-guided engineering, and computational protein design now synthesize stable enzymes, re-engineered protein complexes, and synthetic proteins with new activities [34–37]. Rational protein design, where we purposely account for stability, oligomerization, and orientation, is emerging quickly as an avenue to evaluate theoretical questions in protein chemistry and untangle pathways to therapeutic development [36–37]. Likewise, post-translational modifications (PTMs) (e.g., phosphorylation) can mediate every aspect of protein function and govern the stages of signaling pathways. Within this proposition lies a way by manipulating the inclusion of the PTM site by base editing that can intervene in function without removing the protein open new therapeutic avenues [38–39]. For CRISPR-Cas9 applications, the findings noted above emphasize the value of predicting structural outcome before making edits. To this end, engineered Cas9 like eSpCas9, SpCas9-HF1, HypaCas9, evoCas9, Sniper-Cas9 have been developed each demonstrating higher specificities before maintaining edit efficiency [41–42]. These developments are indeed important steps towards the safe and reliable usage of gene editing as an avenue toward therapeutic application. 1.3 Predicting Protein-Level Outcomes Before Editing Anticipating the structural and functional implications of a targeted mutation is one of the most significant hurdles in genome editing. Mutations can undermine protein stability, folding, or activity, leading to knock-on effects on evolution fitness, cellular function, and human health [23,28]. The recent influx of large mutational datasets has catalyzed the use of machine learning and deep learning models to predict the effects of variants. Different methods such as support vector machines, random forests, Gaussian processes, and neural networks have been used to estimate protein stability and function using sequence and structural data [22]. These methods are extremely promising and provide meaningful information about the effects of identity of single mutations, while expanding our capacity to predict pathogenicity. Within the realm of CRISPR, prediction also encompasses the discovery and classification of Cas proteins. With low sequence conservation and diverse unit functions of Cas proteins, homology-based predictive tools are of limited utility. Computational methods have become essential tools for discovering new Cas proteins, as well as for distinguishing between core Cas proteins which perform CRISPR immunity ie crRNA processing and interference from auxiliary Cas proteins which provide adjuvant processes [24–26]. Our efforts to classify Cas not only enhance our understanding of CRISPR biology, but also expand the genomic editing toolkit with candidate nucleases for therapeutics. Protein structure prediction has advanced in unprecedented ways, thanks to more recent computational innovations such as AlphaFold. Ultimately, the prediction of 3D protein structure and atoms from amino acid sequence has long been considered a grand challenge of molecular biology. AlphaFold and its novel machine learning architecture brings us closer to near-experimental accuracy for protein structure prediction [65-67]. Literature with AlphaFold, and similar predictive methods, demonstrate what is possible when computational biology complements gene editing in a way to model outcomes at the protein level, prior to actual experimentation. 1.4 Advances and Limitations in gRNA Design Tools The success of CRISPR-Cas9 applications is dependent to a large degree upon designing efficient and specific gRNAs. While there has been a growing number of independently-developed computational and machine learning methods to predict gRNA activity in the last decade, ultimately a disconnect remains in the ways that models are validated, and the quality and characteristics of the datasets, that hinder our ability to select gRNAs that may retain predictive accuracy. The Cas9/gRNA complex recognizes target DNA at a target site that is comprised of a 20-nucleotide sequence adjacent to an NGG PAM site where Cas9 generates a DSB [43–44,47–48]. There remain two major issues, cleavage efficiency and off-target activity, that require attention. Off-target cleavage occurs when Cas9 binds to sequences that exhibit partial complementarity and it represents an ominous danger to any real CLIA (Clinical Laboratory Improvement Amendments) application [49]. Both experimental [50] and computational [51] methods have been developed to address issues of this nature, but the continued demand for a high degree of specificity and efficiency has yet to be satisfied. CRISPR interference (CRISPRi) can mitigate (edit) not just protein-coding genes, but systematically perturb noncoding cis-regulatory elements (CREs), and offer synchronous functional dissection of regulatory networks[52–53]. It is imperative to begin developing predictive models of CRISPRi outcomes, since experimentally testing every possible gRNA in any biological system tested is practically impossible and impractical [54–55]. There has been significant development in the last few years with respect to CRISPR in cell and gene therapy applications, including ongoing clinical trials [56]. CRISPR-based clinical interventions have been designed to correct mutations responsible for disease, knock out (stop the function of) distinct defective genes, add protective modifications, and potentially treat a large variety of diseases and conditions, including sickle cell anemia, cystic fibrosis, Alzheimer's disease, and HIV [57–58]. When paired with other programmable nucleases, like TALENs, CRISPR continues to contribute new innovations through synthetic biology, neuroscience, agriculture, and medical research [59]. Nevertheless, a key gap exists: the majority of contemporary tools optimize gRNA design for target recognition and cleavage activity but do not include predicted downstream protein-level consequences of editing. Having the analysis account for downstream effects is necessary for predictive and safe therapeutic gene editing. 1.5 Objective: Connect gRNA Design with Protein Structure Prediction Gene editing is modulated by targeted modification of the genome, generally through Cas-induced DSBs and subsequent repair of the DNA [60–63]. Although CRISPR has been successfully leveraged for programmable targeting of genomes and transcriptomes [64], the downstream implications of edits - especially on proteins - remain less predictable. Recent advances in computational methods make integrated approaches conceivable. AlphaFold has transformed structural biology by providing unprecedented accuracy in predicting protein structures [65-67]. With structural modeling and gRNA design tools used together, researchers may not only predict whether an edit will be made efficiently, but also how the protein will behave afterwards. Such integration is particularly important, as gene editing becomes a player in personalised medicine. CRISPR-based therapies could better enable the treatment of many heritable disorders [68–70], but uptake has been slowed by issues including cost, acceptance by stakeholders, and the need to show it is superior to traditional strategies [71–75]. One way to enhance therapeutic precision, safety, and individualisation is to combine gRNA selection with outcome prediction of gRNA effects at the protein-level, which will move us closer to a future of precision medicines.
  • Methods: Effective literature search was performed in PubMed, Scopus, Web of Science, and Google Scholar to locate studies published until July 2025 which had a focus on CRISPR-Cas9 genome editing, guide RNA (gRNA) design, protein structural prediction and computational approaches to read the functional consequences of mutations. With the use of keywords and Boolean equations like "CRISPR-Cas9," "gRNA design," "off-target prediction," "protein structural modeling," "AlphaFold," "computational protein engineering," and "machine learning for mutation prediction" were searched. Any article that represented the methodological aspect of the computational strategies to connect the protein consequences of CRISPR-mediated edits was considered for inclusion, while papers without methodological clarity, non-English publications, and those articles, which were apart from protein-level outcomes, were disqualified. Reference management software (EndNote X9 and Mendeley) was used for the convenience of citation ordering and duplicate removal. After that, all studies included in the review were scrutinized and grouped into thematic domains such as gRNA design algorithms, protein structure, and function modeling tools, machine learning-based predictors of mutational impact, and integrative CRISPR-protein engineering frameworks. Besides this, the main criteria were considered when computational strategies were put to a critical test. These include accuracy, scalability, integration potential, and reported limitations. The extracted data was combined to highlight current achievements, remaining challenges, and potential follow-up studies in computational approaches to predict structural and functional outcomes of genome editing by CRISPR.
  • Results: Theliterature survey discovered over 150 publications relevant to the topic, out of which 87 were further analyzed as per the inclusion criteria. About 40% of them dealt with computational projects related to the design of gRNAs, the top mentioned being CRISPR-Cas Designer, CHOPCHOP, and DeepCRISPR platforms. All three recorded high accuracy in the prediction of on-target efficiency; however, the success in reducing off-target effects was variable. Protein structural modeling along with the prediction of mutation impact attracted about 35% of the studies in which the recurring techniques were AlphaFold, Rosetta, and molecular dynamics simulations as they could demonstrate the adjustment of conformation due to CRISPR-mediated edits. Machine learning methods which were around 20% of the selected articles reported experiments that showed promise in correlating genomic edits with protein stability and functional outcomes particularly if an experimentally validated mutation dataset was used for training. Overall, the analyzed papers highlighted that the combination of gRNA design with protein-level follow-up modeling is at present still substantially lagging despite the individual progress in each domain. The few tools that actually make such a connection between the two fields, showed strong points not only for efficiency in gene-editing but also prediction of the structural and functional changes caused by induced mutations. This is suggestive of a movement towards the development of integrated computational pipelines that can potentially allow seamless transition from CRISPR edit design to protein engineering applications however, at this point there are still issues of scale, accuracy, and experimental validation that need to be overcome.
  • Conclusion: 2. Conclusion CRISPR-Cas9 has undoubtedly changed genome editing through drastic programmability and efficiency in DNA editing. The progress and collective advances in gRNA design, the engineering of Cas9, and computational prediction in the field over the past decade, have improved editing specificity and subsequently reduced off-target effects. However, as indicated throughout this review, the field has concentrated on predicting editing efficiency at the level of DNA, and neglected to consider how mutations may impact gene-specified polypeptide structure and function. Proteins are the chief effectors of cellular processes, and even slight changes in amino acid sequence bear important consequences on folding, stability, activity, and interaction. Synthesis between protein engineering principles, structural biology, and computational modeling—particularly through the series of AlphaFold revolutions—now offers an unprecedented platform for anticipations of these results prior to experimental uptake. Combining gRNA selection with prediction at the protein level enables scientists to anticipate functional implications, minimize unwanted effects, and design safer, more effective therapeutic strategies. Also, the combined strategies hold promise for applications beyond medicine. In agriculture, stable protein-level edits could optimize crop traits, enhance abiotic stress tolerance, and enhance nutritional content. In biotechnology, engineered proteins with bespoke properties can maximize industrial enzyme production and synthetic biology. Lastly, the integration of gene editing with predictive structural biology constitutes a new genome engineering paradigm that not only aims for precise DNA targeting but also for functional precision and safety at the protein level.
  • Keywords: CRISPR-Cas9, Gene Editing, gRNA Design, Protein Structure