MarketAlert – Real-Time Market & Crypto News, Analysis & AlertsMarketAlert – Real-Time Market & Crypto News, Analysis & Alerts
Font ResizerAa
  • Crypto News
    • Altcoins
    • Bitcoin
    • Blockchain
    • DeFi
    • Ethereum
    • NFTs
    • Press Releases
    • Latest News
  • Blockchain Technology
    • Blockchain Developments
    • Blockchain Security
    • Layer 2 Solutions
    • Smart Contracts
  • Interviews
    • Crypto Investor Interviews
    • Developer Interviews
    • Founder Interviews
    • Industry Leader Insights
  • Regulations & Policies
    • Country-Specific Regulations
    • Crypto Taxation
    • Global Regulations
    • Government Policies
  • Learn
    • Crypto for Beginners
    • DeFi Guides
    • NFT Guides
    • Staking Guides
    • Trading Strategies
  • Research & Analysis
    • Blockchain Research
    • Coin Research
    • DeFi Research
    • Market Analysis
    • Regulation Reports
Reading: Deep learning guided design of protease substrates – Nature Communications
Share
Font ResizerAa
MarketAlert – Real-Time Market & Crypto News, Analysis & AlertsMarketAlert – Real-Time Market & Crypto News, Analysis & Alerts
Search
  • Crypto News
    • Altcoins
    • Bitcoin
    • Blockchain
    • DeFi
    • Ethereum
    • NFTs
    • Press Releases
    • Latest News
  • Blockchain Technology
    • Blockchain Developments
    • Blockchain Security
    • Layer 2 Solutions
    • Smart Contracts
  • Interviews
    • Crypto Investor Interviews
    • Developer Interviews
    • Founder Interviews
    • Industry Leader Insights
  • Regulations & Policies
    • Country-Specific Regulations
    • Crypto Taxation
    • Global Regulations
    • Government Policies
  • Learn
    • Crypto for Beginners
    • DeFi Guides
    • NFT Guides
    • Staking Guides
    • Trading Strategies
  • Research & Analysis
    • Blockchain Research
    • Coin Research
    • DeFi Research
    • Market Analysis
    • Regulation Reports
Have an existing account? Sign In
Follow US
© Market Alert News. All Rights Reserved.
  • bitcoinBitcoin(BTC)$70,813.00-1.12%
  • ethereumEthereum(ETH)$2,177.77-2.91%
  • tetherTether(USDT)$1.000.02%
  • rippleXRP(XRP)$1.33-3.34%
  • binancecoinBNB(BNB)$599.44-2.66%
  • usd-coinUSDC(USDC)$1.000.01%
  • solanaSolana(SOL)$81.81-3.21%
  • tronTRON(TRX)$0.3171240.25%
  • Figure HelocFigure Heloc(FIGR_HELOC)$1.03-0.08%
  • dogecoinDogecoin(DOGE)$0.091154-3.55%
Learn

Deep learning guided design of protease substrates – Nature Communications

Last updated: January 6, 2026 4:50 pm
Published: 3 months ago
Share

CleaveNet predicts cleavage efficiencies of peptides by MMPs

We trained our CleaveNet models on a publicly available dataset of c.a. 18,500 mRNA-displayed synthetic substrates and their continuous cleavage efficiencies across 18 MMPs, quantified as a normalized Z-score () representing the relative strength of cleavage of substrate s by protease m. Although sequences with higher Z-scores can generally be expected to be cleaved with higher likelihoods, we note that this readout does not directly distinguish cleaved from non-cleaved substrates.

Since a critical component of substrate design is the down-selection of candidate substrates from a large combinatorial space, we first developed a predictive deep learning model, the CleaveNet Predictor, to score and virtually screen sequences in silico. We formulated the task of the CleaveNet Predictor as a multi-output sequence-to-function regression problem (Fig. 2A). Given an input amino acid sequence s, the model predicts continuous cleavage values for 18 MMPs. Their associated uncertainty scores are quantified as the standard deviation of predicted Z-scores measured by training an ensemble of five predictor models over the mRNA-display dataset. Continuous values can optionally be converted to a binary classification output (i.e., cleaved vs. not cleaved) based on a desired threshold Z.

We investigated two model architectures common for sequence modeling — a recurrent bidirectional LSTM architecture and a transformer architecture. LSTMs learn patterns sequentially, while transformers learn patterns by looking at all elements of a sequence simultaneously and are the dominant architecture for protein language modeling. To assess model performance and generalizability, we evaluated prediction performance across two datasets: sequences obtained from a 20% random split of the training dataset that were further filtered for homology against the training dataset (see “Methods”) and never seen during training (mRNA-display test) and sequences obtained from an independent set of 71 Förster resonance energy transfer (FRET)-paired sequences that were previously screened against seven recombinant MMPs in vitro (fluorescence test). Substrates across the two test sets differed in length (10-mers vs. 7- to 14-mers for mRNA-display vs. fluorescence, respectively) and amino acid composition (Fig. 2A), and were assayed by very distinct experimental methods.

Both the LSTM and transformer models displayed similar predictive performance on the test sets, as supported by their comparable mean absolute errors between the true and predicted cleavage scores (Fig. 2B and Supplementary Table 1) and by the strong correspondence between predicted and true for both models on the mRNA-display test set (Supplementary Figs. 1 and 2). With the goal of optimizing performance for substrates in the larger and more diverse mRNA-display dataset, we used the transformer-based model as the CleaveNet Predictor for subsequent analyses. The models performed better on some MMPs than others, with better performance achieved for MMPs that had either a higher fraction of training examples with high Z-scores or that displayed a wider dynamic range of Z-scores between cleaved and uncleaved sequences (Supplementary Fig. 3). These results suggest that our models may be more effective at learning cleavage patterns for proteases with more cleaved training examples or more specific cleavage patterns, respectively.

We observed strong correspondence between predicted and true for MMP13 (Pearson’s r = 0.80; Fig. 2C) and for other MMPs in the mRNA-display test set (Supplementary Fig. 2). To assess model performance for classification, we assigned true cleaved labels to sequences at varying Z-score thresholds: , where {Z = 0, 1.0, 1.5, 2.0, 2.5}. We then calculated receiver-operator curves given a predicted , over this range of possible cleavage thresholds. CleaveNet predictions were robust over the range of Z evaluated, and classification performance improved as the Z-threshold increased, reaching an AUC value of 0.98 at Z = 2.5 (Fig. 2D and Supplementary Figs. 4, and 5); a threshold of 2.5 was previously reported to be broadly consistent with confident cleavage across all MMPs for this dataset.

To assess the robustness and generalizability of the models, we next evaluated their performance on the biochemically distinct fluorescence set (Supplementary Figs. 6-9). The Z-scores predicted by the transformer model displayed a strong positive linear correlation with true Z-scores (m = MMP13, Pearson’s r = 0.80, Fig. 2E), especially for that are more likely to correspond to cleaved substrates. Given that values are dependent on the nature of the experimental assay and cleavage range of the substrate library being profiled, which differ between the mRNA-display and fluorescence sets, the relative rank of the predicted Z-scores is a more appropriate metric to assess performance on the fluorescence test set. Independent of their exact value, we expect low Z-scores to be non-cleaved and higher scores to be cleaved. The CleaveNet Predictor consistently predicted lower for sequences that were not cleaved (blue) relative to substrates that were cleaved (red) (Fig. 2F). The uncertainty of the model coarsely correlated with the absolute error in (Supplementary Fig. 10) and reflected the confidence of individual predictions, with relatively low uncertainty for most non-cleaved substrates or top cleaved substrates (Fig. 2G and Supplementary Figs. 11 and 12).

Taken together, these results support the accuracy and robustness of the CleaveNet Predictor in producing MMP cleavage scores for synthetic substrates, even across biochemically distinct assay setups.

Having validated the CleaveNet Predictor and with the aim of automating the substrate nomination step, we next sought to develop a generative model — the CleaveNet Generator — that would learn pan-MMP cleavage specificities and enable unconditional sampling of diverse MMP-cleavable substrates without further input. Ideally, generated sequences should capture the amino acid distribution and the biophysical and functional properties of training sequences, while still being diverse from each other and distinct from the training set. To generate sequences relevant to MMP activity, we trained an autoregressive transformer model, which can generate variable-length sequences, on the training split of the mRNA-display dataset (see “Methods” for details). As a baseline, we generated sequences by sampling randomly and independently from the position-wise distribution of amino acids across all training sequences (referred to as the site-independent baseline, Fig. 3A).

We unconditionally generated 20,000 sequences using the CleaveNet Generator and the site-independent baseline and compared them to the mRNA-display test sequences (Fig. 3B, Supplementary Fig. 13). As expected for a method that considers each site independently, the baseline closely matched the test set position-wise amino acid distributions (average KL divergence = 0.237; Supplementary Fig. 13B). The CleaveNet Generator was effective at capturing the canonical MMP motif proline-X-X-hydrophobic residue and exceeded the site-independent baseline at recapitulating the amino acid distributions in positions P3 to P2′, which are most relevant for cleavage (position-wise KL divergence averaged over P3 to P2’ = 0.25 for CleaveNet-generated vs. 0.404 for site-independent baseline; Fig. 3B and Supplementary Fig. 13). The CleaveNet-generated and the site-independent baseline sequences exhibited biophysical properties consistent with those of the mRNA-display test dataset (Fig. 3C and Supplementary Fig. 14), supporting their quality and plausibility. Scoring each set of sequences with the CleaveNet Predictor (Fig. 3D) revealed that, across MMPs, the cleavage scores for the CleaveNet-generated sequences closely matched the distribution of scores for mRNA-display sequences, while the sequences from the site-independent baseline displayed significantly lower values across all MMPs (Fig. 3E).

To further evaluate the plausibility of generated sequences, we inspected the cumulative density function (CDF) of unique position-independent k-mers in each set of generated or mRNA-display sequences (Fig. 3F). Each of the mRNA-display, CleaveNet-generated, and site-independent baseline sequence sets effectively sampled the true 3-mer space, as demonstrated by saturation of the CDF by 8000 unique k-mers. However, as the space of possible unique k-mers became increasingly large for 4-, 5-, and 6-mers, the site-independent baseline covered the space near linearly, suggesting generation closer to random across the space of all possible k-mers. In contrast, the distribution of unique k-mers was more consistent between the mRNA-display and CleaveNet-generated sequences across all k-mer lengths, indicating that CleaveNet generations exhibit similar k-mer diversity to sequences from the mRNA display set. Inspection of the top occurring 5- and 6-mers in each set of sequences revealed that those from CleaveNet were closer to canonical MMP motifs than 5-mers from the site-independent baseline (Fig. 3F). Further, only 95 of the 20,000 sequences generated by CleaveNet were exact matches to train set sequences, indicating that CleaveNet was not simply memorizing training data. Together, these results demonstrate that the CleaveNet Generator can produce biophysically plausible sequences that exhibit expected cleavage profiles across MMPs and possess diverse, distinct motifs while retaining sequence identities compatible with MMP cleavage.

To delve deeper into the biological validity of the CleaveNet pipeline, we virtually screened the 20,000 unconditional generations and analyzed the sequence determinants and activity profiles of top-scoring substrates across individual MMPs. First, we inspected position-wise determinants of specificity by plotting IceLogos, which reflect position-specific amino acid compositions, for each set. Despite containing no overlapping sequences, the amino acid profiles of top-scoring generated sequences closely matched those in the mRNA display test set for individual MMPs (Fig. 4A). Shared trends also emerged across MMPs within the same subclass (i.e., collagenases, gelatinases, MT1-MMP, MT2-MMP, and stromelysins). Although the PXXL motif was prominent across all MMPs, less expected amino acid preferences also emerged in the CleaveNet-generated sequences. Notably, a preference for methionine at P4 was identified (fold-increase above natural frequency of: 11.4-fold for gelatinases, 10.9-fold for collagenases, 14.0-fold for MT1-MMP, 9.2-fold for MT2-MMP, and 12.2-fold for stromelysins), suggesting a previously unexplored relevance of this amino acid to the cleavage efficiency of short peptides for the MMP catalytic class.

Closer inspection of specificity determinants for the gelatinases MMP2 and MMP9, which have well-characterized cleavage preferences, revealed trends consistent to those previously reported (Fig. 4A). For instance, we noted a higher prevalence of proline at P3 (99% of sequences) in the top Z-scoring generated sequences for gelatinases over other MMP classes, a preference for small amino acids such as glycine and alanine over aromatic and larger aliphatic residues at P1 and P3′, and a preference towards the positively charged amino acid arginine (27.5% of sequences) at P2′.

We next evaluated our pipeline’s ability to learn complex inter-amino acid relationships critical to substrate recognition, such as subsite cooperativity. Such interactions between multiple subsites help enable cleavage but cannot be captured by position-wise IceLogos. To understand whether our pipeline was capturing such effects, we inspected the frequency and the identity of 3-mers shared across the 50 top-ranking MMP13 substrates in the CleaveNet-generated, mRNA display, and site-independent baseline sets (Fig. 4B). We observed significantly higher frequencies of some 3-mers over others for the generated and the mRNA display sets, suggesting a strong advantage for cleavage when specific amino acids are positioned contiguously. The canonical motif PLG emerged as the top occurring 3-mer for both datasets, and the top-4 occurring 3-mers were shared between the generated and mRNA-display sets, suggesting a high degree of meaningful conservation in generated sequences. In contrast, the site-independent baseline set displayed little preference towards any 3-mer and did not enrich for the canonical PLG motif. These results lend further support to the CleaveNet generation strategy and suggest that CleaveNet can learn complex inter-amino acid relationships that are otherwise overlooked by simply sampling from position-wise amino acid distributions independently.

As a final measure of biological validity, we used the CleaveNet Predictor to score the cleavage profiles of generated sequences, and then clustered the activity profiles of the top 25 Z-scoring sequences for each MMP (Fig. 4C). The cleavage profiles for all MMPs, except MMP11 and MMP12, clustered based on the phylogenetic distance of their catalytic domains (Supplementary Fig. 15). Group 1 consisted of the 4 transmembrane-type MT-MMP family members (MMP14, MMP15, MMP16, and MMP24); group 2 was collagenases (MMP1 and MMP8); group 3 included gelatinases (MMP2 and MMP9), the collagenase MMP13, and the stromelysin MMP3; group 4 was the glycosylphosphatidylinositol-anchored MT-MMPs (MMP17 and MMP25); and group 5 included the non-furin regulated MMPs MMP7, MMP10, and MMP20.

Altogether, these results reinforce the biological validity of CleaveNet sequences by demonstrating their alignment with well-established MMP cleavage preferences, their ability to capture subsite cooperativity patterns, and the clustering of their activity profiles according to the phylogenetic relationships of MMPs. Moreover, CleaveNet generated comparable — but non-overlapping — data to those obtained through the time and resource-intensive mRNA display screen used to collect the training data, providing a rapid in silico method to increase the scale and diversity of sequences that can be explored for tasks of interest.

To further validate the CleaveNet pipeline, we used CleaveNet to nominate efficient substrates for MMP13 and tested them experimentally through an in vitro cleavage assay. To this end, we virtually screened the 20,000 unconditionally generated substrates with the CleaveNet Predictor and rank-ordered them by their uncertainty-aware predicted cleavage scores for MMP13 (Fig. 5A). To maximize the likelihood of identifying distinct substrates, we selected 24 substrates with the highest predicted MMP13 cleavage scores and non-overlapping 5-mers. As controls for this CleaveNet-guided substrate design pipeline, we included site-independent substrates that were obtained using the site-independent baseline alone or after rank-ordering with the CleaveNet Predictor. The latter group was included, given the poor performance of the site-independent baseline alone in previous analyses, and to assess the value of in silico screening with the CleaveNet Predictor on substrates not designed by the CleaveNet Generator. The five top MMP13 substrates and five uncleaved substrates from the mRNA display training set were included as positive and negative controls, respectively. We synthesized these substrates as fluorogenic FRET-probes and screened them in vitro against recombinant MMP13 by measuring increases in fluorescence indicative of cleavage over time (Fig. 5A). Fold changes in fluorescence at endpoint were calculated as a proxy for cleavage activity for each substrate.

For ease of interpretation, fluorescence fold changes were transformed to cleavage efficiencies, with a value of 0 for substrates that were not cleaved, a value of 1 for the substrate with the highest cleavage rate, FC, and a fractional cleavage efficiency between 0 and 1 for all other cleaved substrates. We first assessed the hit rate, defined as the fraction of substrates with a measurable cleavage in vitro, across the different sequence groups. In agreement with our CleaveNet-predicted cleavage scores, all CleaveNet-generated substrates were cleaved (100%, 24/24), whereas only one of the eight substrates nominated with the site-independent baseline was cleaved (12.5%, 1/8) (Fig. 5B). Complementing the site-independent baseline with the CleaveNet Predictor enriched for sequences with high predicted cleavage scores and increased the hit rate to 100% (8/8) (Fig. 5B).

We next assessed the absolute values of MMP13 cleavage efficiencies across the groups (Supplementary Table 3). In addition to significantly outperforming the site-independent baseline (p < 0.01), both CleaveNet-guided approaches yielded substrates with cleavage efficiencies superior to those of positive controls from the training set (median cleavage efficiencies of 0.22, 0.37, and 0.64 for the mRNA-displayed, CleaveNet-generated, and site-independent + CleaveNet Predictor groups, respectively; Fig. 5B and Supplementary Table 3). Across its two design groups, CleaveNet produced 18 substrates with MMP13 cleavage efficiencies higher than that of the most highly cleaved training substrate, DL57 (RMPLGLRAPA, efficiency 0.46).

To identify sequences that were cleaved preferentially by MMP13, we compared the IceLogos for all sequences cleaved by MMP13 (n = 38; 24 CleaveNet-generated, 1 site-independent baseline, 8 site-independent baseline + CleaveNet Predictor, and five training sequences) with the subset that displayed efficiencies greater than 0.6 (n = 7; 3 CleaveNet-generated, 4 site-independent baseline + CleaveNet Predictor) (Fig. 5C). Whereas MMP13 was able to cleave 38/50 substrates and thus tolerated a large diversity of sequences, those that were most efficiently cleaved displayed more constrained sequence preferences. The top seven best cleaved substrates shared PL at positions P3-P2, suggesting a strong bias of MMP13 for leucine at P2, which may have been previously overlooked. The P1 and P1' sites were restricted to hydrophobic amino acids, as is canonical for MMPs, but displayed greater diversity than P3 and P2, with P1 allowing glycine, alanine, or proline and P1' allowing methionine, leucine, or isoleucine. Albeit less dominant, a preference for alanine at P4 and for alanine or aspartic acid at P3' also emerged in the seven best-cleaved substrates. The 4-mers shared across the top seven sequences — namely APLG, LGLT, PLAM, PLGI, and PLGL — mostly overlapped with the P4-P2' region, but the 4-mer TASG overlapped with the P2'-P5' region. The top MMP13 substrate DL73 (LFPLAMMDMT) exhibited an unconventional MMP cleavage sequence, with phenylalanine at P4 and methionines at P1' and P2'. The two next best cleaved sequences, DL6 and DL50, shared the 6-mer APLGLT. These observations of the CleaveNet-guided substrates indicate that positions beyond the canonical P3-P1' region may still be important contributors to MMP13 efficiency. Performing a similar analysis with the training dataset revealed distinct and less conserved amino acid preferences at all positions, with the exception of proline at P3 (Supplementary Fig. 16). These findings suggest that CleaveNet-designed substrates reveal distinct sequence determinants of MMP13 efficiency beyond those that could be inferred from the training set.

To characterize the sequences of these seven highest MMP13 scoring substrates, we identified the sequences in training with which they shared the longest k-mers and their corresponding cleavage Z-scores (Supplementary Table 4). Sequences DL5, DL6, and DL50 shared 5-, 6-, and 8-mers with at least one training sequence with high Z-scores, suggesting a degree of overlap with training sequences. In contrast, DL73 (the top MMP13 substrate) and DL52 shared at most a single 4-mer with a small subset of training sequences exhibiting variable Z-scores. Further, the training sequences with the longest k-mer that matched DL3 and DL49 were not cleaved in the training set, supporting the distinction of some CleaveNet-guided sequences.

Performing in silico screening of the top two MMP13 sequences across other MMPs using the CleaveNet Predictor suggested that, albeit MMP13-cleavable, these substrates could also be susceptible to cleavage by other MMPs (Fig. 5D), making them inappropriate for applications where MMP13 selectivity is required.

Altogether, these results suggest that substrates designed for efficient MMP13 cleavage using CleaveNet-guided strategies are cleavable by recombinant MMP13 in vitro, overperform the site-independent baseline, and exhibit higher cleavage efficiencies and distinct motifs than top-scoring substrates in the mRNA-display training set. However, these substrates are predicted to be cleaved promiscuously across multiple MMPs. To enrich for sequences with target cleavage profiles, such as high selectivity for MMP13, we next developed a guided generation approach.

Identifying substrates that meet a predefined cleavage profile can be achieved by generating sequences unconditionally and filtering them based on their predicted cleavage profiles across MMPs. For instance, to design for substrates with high selectivity, sequences may be filtered by a selectivity metric calculated from predicted cleavage scores. However, this approach is untargeted (i.e., it may require a large number of unconditional generations to arrive at a selective sequence) and would be hard to generalize to nuanced cleavage patterns (e.g., if designing for substrates that are cleaved by multiple proteases of interest but not by others). To overcome these challenges, the CleaveNet Generator was trained with conditioning tags specifying target cleavage profiles across MMPs to guide generations to match desired profiles (Fig. 6A).

To test this capability, we sought to use CleaveNet to design substrates selective for MMP13. We used the CleaveNet Generator to produce 20k sequences unconditionally or conditionally given a conditioning tag specifying high MMP13-selectivity. As baselines, 20k sequences were also sampled from the amino acid distribution of all training sequences (unconditional site-independent baseline) and from the distribution of the top 50 MMP13-selective sequences (conditional site-independent baseline). Here, the selectivity score refers to the cleavage of a sequence by one MMP relative to the average cleavage across all other MMPs, as previously described (see "Methods").

Both sets of CleaveNet-guided sequences displayed significantly higher selectivity scores than the mRNA-display sequences (p < 0.0001; Fig. 6B). However, conditionally generated sequences achieved even greater selectivity than unconditionally generated ones (p < 0.0001; 5.5-fold higher median selectivity, Fig. 6B), highlighting the effectiveness of the conditional generation strategy. Accordingly, the substrates conditionally generated for MMP13-selectivity (Fig. 6C) were quite distinct from those in the mRNA display train dataset (Fig. 3A, IceLogo).

To investigate the distinctness of the conditionally generated sequences, we calculated the fraction of shared and unique k-mers in the combined pool of mRNA-display sequences and the sequences conditionally generated for MMP13 selectivity (Fig. 6D). While most 3-mers were shared, the fraction of shared k-mers decreased with k-mer length, leading to almost mutually exclusive sets of 6-mers between the two datasets. This observation indicates a divergence of CleaveNet's conditional generations from training sequences and raises the prospect of discovering motifs dictating MMP13 selectivity.

To demonstrate the generalizability of our approach, we used CleaveNet to design substrates selective for MMP9, a biologically distinct enzyme that plays direct functional roles in multiple hallmarks of cancer and has been a challenging target for the design of selective substrates, due to high overlap of substrate recognition with MMP2. CleaveNet's conditionally-generated sequences achieved significantly greater MMP9-selectivity scores than both unconditional generations and samples from the site-independent baselines (Supplementary Fig. 17) and demonstrated substantial sequence divergence from the training set (Supplementary Table 5), supporting the generalizability of the CleaveNet pipeline to multiple MMPs. Together these in silico results suggest that the CleaveNet conditional generation strategy may offer a powerful tool to guide generations towards sequences with desired cleavage profiles.

We next sought to validate experimentally that conditionally generated CleaveNet sequences indeed exhibited superior selectivity over other design strategies. To this end, we performed an in vitro screen of 95 FRET-paired, fluorogenic substrates (n = 40 selected for MMP13 efficiency, n = 40 designed for MMP13 selectivity, n = 15 total sequences from the mRNA-display set as controls for top efficient, top selective, and negative cleavage) against 12 recombinant proteases spanning all activity clusters in Fig. 4C (Fig. 7A and Supplementary Fig. 18). The 95-substrate panel was highly diverse in sequence space, as evidenced by a mean pairwise sequence similarity of 2.1 across the 95 tested substrates (Supplementary Figs. 19 and 20).

There was strong agreement between technical duplicates for each MMP (0.74 < R < 0.98; Supplementary Fig. 21) and no dependence of cleavage on the properties of the crude peptides used (Supplementary Fig. 22). Moreover, given that cleavage events were detected for all MMPs in the fluorogenic screen, we could identify individual thresholds of predicted Z-scores associated with true cleavage for each MMP (ranging between CleaveNet-predicted threshold scores of 0.3 and 2.4; Supplementary Table 6 and Supplementary Fig. 23). Importantly, such cleavage thresholds could not previously be inferred from the mRNA-display dataset alone, as this dataset did not distinguish cleaved from non-cleaved substrates.

Visualizing the cleavage efficiencies of all protease-substrate pairs demonstrated that substrates designed for MMP13 efficiency were highly cleaved by MMP13 but also fairly promiscuously cleaved by other proteases, while substrates designed for MMP13 selectivity were more selectively cleaved by MMP13, consistent with design expectations (Fig. 7B and Supplementary Fig. 24). We next compared selectivity scores across the different MMP13-selective design groups (Fig. 7C and Supplementary Table 7). Similar to the results observed for MMP13 efficiency, both CleaveNet-guided approaches (conditional CleaveNet generated and conditional site-independent baseline + CleaveNet Predictor) outperformed the conditional site-independent alone baseline. However, given that some of the selective substrates from training were uniquely cleaved by MMP13 and thus had the highest selectivity score attainable, the CleaveNet-guided substrates at best only reached equivalent selectivity to these training examples.

Albeit highly specific, the MMP13-selective training substrates had relatively low efficiency scores (E < 0.16; Supplementary Table 7). We were thus curious to investigate how substrates from different groups compared as a function of both their cleavage efficiency and selectivity for MMP13. Plotting efficiency scores versus selectivity scores for each substrate enabled division of substrates into 4 quadrants: those with low efficiency-low selectivity (n = 56), low efficiency-high selectivity (n = 19), high efficiency-low selectivity (n = 15), and high efficiency-high selectivity (n = 5) (Fig. 7D). There were three examples of generated substrates that achieved perfect selectivity for MMP13 (i.e., DL41, DL32, and DL28), matching the selectivity score of the best selective substrate in training (DL93) but at the expense of low efficiency (Fig. 7D, E).

A handful of substrates, like DL48, emerged in the upper right quadrant (Fig. 7D, E), characterized by much higher efficiency while maintaining high selectivity. By virtue of these properties, substrates in this quadrant may prove useful for engineering applications, such as the numerous conditionally-activated therapeutics in development. Notably, this quadrant contained only fully CleaveNet-designed substrates. Additionally, all other substrates in this quadrant were distinct from the training set, with the closest train-set sequences seldom being cleaved by MMP13, but never selectively (Supplementary Table 8), with the exception of DL16, which shared a 4-mer with the top 3rd most selective substrate in training.

Closer inspection of the sequences in each quadrant of interest reinforced previously identified determinants of high MMP13 efficiency (Fig. 5C), while shedding light into determinants of MMP13 selectivity (Fig. 7F). Substrates with high selectivity but low efficiency were highly enriched for arginine at P2, aromatic residues, especially phenylalanine, at P1', and aspartic acid at P3', while those with both high efficiency and high selectivity shared traits from both.

Taken together, this large 95 substrate versus 12 MMP in vitro screen validated and calibrated our CleaveNet Predictor model across MMPs spanning all catalytic activity clusters, and successfully established the ability of the CleaveNet Generator to design substrates conditioned on a target cleavage profile. A subset of sequences conditionally designed for high MMP13 selectivity matched the maximum selectivity levels observed in training (low efficiency-high selectivity quadrant), while a different set exhibited higher cleavage efficiency while maintaining high selectivity (high efficiency-high selectivity quadrant), a cleavage profile that is desirable for engineering applications and that was largely absent from training.

Read more on Nature

This news is powered by Nature Nature

Share this:

  • Share on X (Opens in new window) X
  • Share on Facebook (Opens in new window) Facebook

Like this:

Like Loading...

Related

$BIBL | Where are the Opportunities in ($BIBL) (BIBL)
Tesler Trading Review 2025: Is It Legit Or A Scam?
Solak spins NFL Week 15 forward: The Broncos and Rams clinched … now can they reach the Super Bowl?
Centre Not Imposing Any Language On Any State: Union Minister Dharmendra Pradhan
Peacocks declared safe at Summerland animal sanctuary – Trail Times

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Email Copy Link Print
Previous Article Sports News | BMW Golf Cup 2026 Returns with Its Biggest-ever Amateur Golf Tournament in India | LatestLY
Next Article Never use a tumble dryer to dry shoes as expert shares the right way to do it | RSVP Live
© Market Alert News. All Rights Reserved.
Welcome Back!

Sign in to your account

Username or Email Address
Password

Prove your humanity


Lost your password?

%d