What is antibody developability and why is it important?

Developing antibody therapeutics is challenging. For a single antibody molecule to reach the clinical market, it needs to exhibit desirable qualities in two main domains: 1) target (antigen) binding and 2) developability.

While target binding generally reflects the specificity and potency of the antibody therapeutic, developability is a wider term that is commonly used to refer to several pragmatic considerations of therapeutic antibody design such as its safety, manufacturability and storability ^1,2.

‍

Overall developability is measured by quantifying several physicochemical and biophysical parameters on the antibody molecule including solubility, stability, immunogenicity, structural dynamics and aggregation (among others, Figure 1) ^3,4.

This multifactorial nature of developability makes it tricky to achieve acceptable values for all developability parameters (DPs) at once while keeping the efficacy of the antibody intact.

‍

**Figure 1.** Classes of antibody developability parameters, which can be further divided into sequence- and structure-based parameters. Overlayed on Pembrolizumab structure32.

‍

How to quantify antibody developability?

Developability is usually assessed in the lab by measuring the values of developability parameters on antibody molecules using numerous assays ^5,6. But, these assays take a long time to complete, and require plenty of resources and experienced labour, which makes them suitable to assess only a handful of antibodies at once.

‍

For these reasons, researchers have been developing high-throughput computational tools that can predict or calculate the values of antibody developability parameters for large datasets of antibodies in silico ⁷.

These tools are being gradually adopted by the antibody discovery campaigns in large pharmaceutical corporations as a first screening step to narrow down the pool of antibody candidates that is carried forward for experimental developability assessment ^8,9.

‍

Computational tools for in silico antibody developability prediction

In abstract terms, antibody developability is governed by its amino acid sequence as well as the structural conformation of the antibody molecule in the 3D space. Thus, computational antibody developability tools can be broadly divided into two main categories, depending on its required input:

Sequence-based developability prediction
Structure-based developability prediction

Sequence-based tools

Sequence-based tools require the sole knowledge of the amino acid sequence of the antibody to predict DP values. The input for such tools is usually in the form of simple sequence strings or a FASTA file. Examples of these tools are BioPhi for immunogenicity prediction and sequence humanization ¹⁰ and SoluProt for solubility estimation ¹¹.

Structure-based tools

Structure-based tools require the knowledge of the antibody structure. This can be either in the form of the experimentally defined (crystal) structure or the predicted model of the antibody molecule. In both, the input is usually in the form of a pdb file, but also the newer CIF format. Examples of these tools are FreeSASA for calculating solvent accessible surface areas ¹² and PROPKA to calculate the molecular electrochemical properties such as charge heterogeneity and 3D-based isoelectric point ^13,14.

‍

A major limitation that faces structure-based in silico developability estimations is the current variance among structure prediction tools and the importance of incorporating molecular dynamic simulations in structural developability calculations ^15–18. This point will be further expanded on in a future article from us on the topic.

‍

It is worth highlighting that developability screening tools are becoming increasingly accessible not only to bioinformaticians, but also to scientists with less experience in data science or bioinformatics. Some of these tools offer user-friendly interfaces via web servers such as the therapeutic antibody profiler (TAP) ¹⁷ for the assessment of overall developability in relation to clinical stage antibodies, and CamSol for solubility prediction and sequence optimisation ¹⁹.

Structure- and sequence-based antibody developability prediction tools: NetMHCIIpan, BioPhi, SoluProt, CamSol, FreeSASA, PROPKA and TAP in a venn diagram with FASTA and PDB file inputs illustrated — **Figure 2.** Structure- and sequence-based antibody developability prediction tools

‍

The role of machine learning in antibody developability prediction

Some developability estimation tools are based on relatively simple computation such as the instability index ²⁰ and the sequence-based charge calculation ²¹, where a few data inputs is sufficient to calculate or estimate the DP value. Such parameters have been previously referred to as low-level DPs. However, the value estimation of most developability parameters is more complex and relies on larger data inputs.

For example the solubility of an antibody molecule is associated with several factors including its full sequence charge, the content and distribution of hydrophobic amino acids, full sequence hydrophobicity, the pH of the solution and expressibility of the antibody in the desired organism (among others). These parameters have been referred to as high-level DPs. Such complexity encouraged the implementation of machine learning (ML) to develop models that can learn the underlying patterns of antibody sequences in order to predict the values of high-level DPs ^22,23.

‍

Developability literature provides comprehensive reviews of ML-aided developability prediction tools ^7,22,24,25. For example, netMHCIIpan is an ML model that uses artificial neural networks (ANNs) to predict the immunogenicity of proteins, including antibodies, starting from sequence input ²⁶. This model was trained on experimental peptide-binding measurements to MHC II molecules which majorly reflects immunogenicity. Another example is SSH2.0 which uses a support vector machine-based (SVM) ensemble model trained on experimental data from 131 antibodies, to predict hydrophobic interaction risk of antibodies using sequence input ²⁷.

However, as experimental developability data is scarce and the number of clinically approved antibodies is considered small (few hundreds), it is challenging to generalise these ML models on new antibody candidates. Indeed training ML models on a small number of sequences could likely result in overfitting as the high diversity of the potential antibody sequence space (estimated 10¹³ sequences for humans) is not captured in the training process ^28,29.

Thus, the need for larger training datasets to better capture the true diversity of antibody sequences motivated the search for alternative larger datasets. To solve this issue, recent efforts have shown promising results in developing generative ML models trained on much larger datasets of synthetic ³⁰ or natural ³¹ antibodies.

Such models have shown the ability to generate antibodies with desirable comparable or improved developability characteristics. Such models could potentially eliminate the need for developability screening for each single parameter by ensuring that the generated antibodies harbour overall desirable developability. It is worth noting that learning the developability rules from the natural antibody repertoires is a topic that will be expanded on in a separate article.

References

1. Bailly, M. et al. Predicting Antibody Developability Profiles Through Early Stage Discovery Screening. MAbs 12, 1743053 (2020).

2. Fernández-Quintero, M. L. et al. Assessing developability early in the discovery process for novel biologics. MAbs 15, 2171248 (2023).

3. Ahmed, L., Gupta, P. & Martin, K. P. Intrinsic physicochemical profile of marketed antibody-based biotherapeutics. Proceedings of the (2021).

4. Negron, C., Fang, J., McPherson, M. J., Stine, W. B., Jr& McCluskey, A. J. Separating clinical antibodies from repertoireantibodies, a path to in silico developability assessment. MAbs 14, 2080628(2022).

5. Jain, T. et al. Biophysical properties of the clinical-stage antibody landscape. Proc. Natl. Acad. Sci. U. S. A. 114, 944–949(2017).

6. Mieczkowski, C. et al. Blueprint for antibody biologics developability. MAbs 15, 2185924 (2023).

7. Akbar, R. et al. Progress and challenges for the machine learning-based design of fit-for-purpose monoclonal antibodies. MAbs 14,2008790 (2022).

8. Xu, Y. et al. Structure, heterogeneity and developability assessment of therapeutic antibodies. MAbs 11, 239–264 (2019).

9. Jain, T., Boland, T. & Vásquez, M. Identifying developability risks for clinical progression of antibodies using high-throughput in vitro and in silico approaches. MAbs 15, 2200540 (2023).

10. Prihoda, D. et al. BioPhi: A platform for antibody design, humanization, and humanness evaluation based on natural antibody repertoires and deep learning. MAbs 14, (2022).

11. Hon, J. et al. SoluProt: prediction of soluble protein expression in Escherichia coli. Bioinformatics 37, 23–28 (2021).

12. Mitternacht, S. FreeSASA: An open source C library for solvent accessible surface area calculations. F1000Res. 5, 189 (2016).

13. Søndergaard, C. R., Olsson, M. H. M., Rostkowski, M.& Jensen, J. H. Improved Treatment of Ligands and Coupling Effects inEmpirical Calculation and Rationalization of pKa Values. J. Chem. TheoryComput. 7, 2284–2295 (2011).

14. Olsson, M. H. M., Søndergaard, C. R., Rostkowski, M.& Jensen, J. H. PROPKA3: Consistent Treatment of Internal and SurfaceResidues in Empirical pKa Predictions. J. Chem. Theory Comput. 7, 525–537(2011).

15. Licari, G. et al. Embedding Dynamics in IntrinsicPhysicochemical Profiles of Market-Stage Antibody-Based Biotherapeutics. Mol.Pharm. (2022) doi:10.1021/acs.molpharmaceut.2c00838.

16. Park, E. & Izadi, S. Molecular Surface Descriptors to Predict Antibody Developability. bioRxiv 2023.07.18.549448 (2023)doi:10.1101/2023.07.18.549448.

17. Raybould, M. I. J., Turnbull, O. M., Suter, A., Guloglu,B. & Deane, C. M. Contextualising the developability risk of antibodies with lambda light chains using enhanced therapeutic antibody profiling. bioRxiv2023.06.28.546839 (2023) doi:10.1101/2023.06.28.546839.

18. Bashour, H. et al. Cartography of the developability landscapes of native and human-engineered antibodies. bioRxiv 2023.10.26.563958(2023) doi:10.1101/2023.10.26.563958.

19. Rosace, A. et al. Automated optimisation of solubility and conformational stability of antibodies and proteins. Nat. Commun. 14, 1937(2023).

20. Guruprasad, K., Reddy, B. V. B. & Pandit, M. W.Correlation between stability of a protein and its dipeptide composition: a novel approach for predicting in vivo stability of a protein from its primary sequence. Protein Eng. Des. Sel. 4, 155–161 (1990).

21. Osorio, D., Rondon-Villarreal, P. & Torres, R.Peptides: A Package for Data Mining of Antimicrobial Peptides. The R Journal vol. 7 4–14 Preprint at (2015).

22. Kim, J., McFee, M., Fang, Q., Abdin, O. & Kim, P. M.Computational and artificial intelligence-based methods for antibody development. Trends Pharmacol. Sci. 44, 175–189 (2023).

23. Waight, A. B. et al. A machine learning strategy for the identification of key in silico descriptors and prediction models for IgG monoclonal antibody developability properties. MAbs 15, 2248671 (2023).

24. Norman, R. A. et al. Computational approaches to therapeutic antibody design: established methods and emerging trends. Brief.Bioinform. 21, 1549–1567 (2020).

25. Khetan, R. et al. Current advances in biopharmaceutical informatics: guidelines, impact and challenges in the computational developability assessment of antibody therapeutics. MAbs 14, 2020082 (2022).

26. Reynisson, B. et al. Improved Prediction of MHC II Antigen Presentation through Integration and Motif Deconvolution of Mass Spectrometry MHC Eluted Ligand Data. J. Proteome Res. 19, 2304–2315 (2020).

27. Zhou, Y. et al. SSH2.0: A Better Tool for Predicting the Hydrophobic Interaction Risk of Monoclonal Antibody. Front. Genet. 13, 842127(2022).

28. Elhanati, Y. et al. Inferring processes underlying B-cell repertoire diversity. Philos. Trans. R. Soc. Lond. B Biol. Sci. 370,(2015).

29. Robert, P. A. et al. Unconstrained generation of synthetic antibody–antigen structures to guide machine learning methodology for antibody specificity prediction. Nature Computational Science 2, 845–865(2022).

30. Akbar, R. et al. In silico proof of principle of machine learning-based antibody design at unconstrained scale. MAbs 14, 2031482 (2022).

31. Shuai, R. W., Ruffolo, J. A. & Gray, J. J. IgLM: Infilling language modeling for antibody sequence design. Cell Syst (2023) doi:10.1016/j.cels.2023.10.001.

32. Scapin, G., Yang, X., Prosise, W. et al. Structure of full-length human anti-PD1 therapeutic IgG4 antibody pembrolizumab. Nat Struct Mol Biol22, 953–958 (2015). https://doi.org/10.1038/nsmb.3129.

‍