DNA synthesis in antibody discovery

Introduction

Based on data from a subset of DNA synthesis service providers, error rates for DNA synthesis could range from 0.007% to 0.02%. Re-sequencing synthesized sequences before amplification and subsequent functional analysis can therefore be valuable for mitigating the risk for downstream errors.

With the cost of synthesizing a base pair hovering around $0.10, the total cost of synthesizing tens or hundreds of sequences remains fairly affordable for most labs. In the case of repertoire sequencing or synthetic libraries, synthesizing hit-picked antibody sequences is necessary for subsequent functional assays, while engineered sequences or synthetic conjugates would also be synthesized prior to functional assays.

To put the error rates into context, for a 96-well plate of IgG heavy chain (VH) sequences at an average length of 110 amino acids, you could have as many as 15 sequences containing a nucleotide error.

Illustration of a 96-well plate and potential errors occurring from DNA synthesis — Example 1. If errors are distributed across the synthesized sequences so that sequences contain at most one error, the % of sequences containing an error can be high

‍

Checking for any inadvertent errors resulting from synthesis by, for example, re-sequencing synthesized sequences before downstream analyses, can be well worth the trouble, should any errors be present.

‍

De novo gene synthesis technologies

Standard gene nucleotide synthesis has largely relied on the process developed by Beaucage and Caruthers in 1981, involving phosphoramidites that allowed production of stable, solid nucleoside phosphites in normal laboratory conditions¹. Deoxynucleotide coupling in chemical synthesis generally has more than 99% coupling efficiency per synthetic cycle².

Figure 1. Phosphoramidite chemistry for DNA synthesis. Original authors Masaki et al. 2022 2. Accessed on NCBI Sept. 8th 2022.

‍

With synthesis cycles around 100 nucleotides, a typical IgG VH, would be constructed by assembling several synthesized sequences with overlapping ends. This means that in order to guarantee correct synthesis and assembly, service providers implement various measures for quality control (QC), such as mass spectrometry, enzyme-based or chromatography, light spectroscopy or sequencing.

Since the standard phosphoramidite-based DNA synthesis cycle is capable of reliably producing at most around 200 nucleotides, longer DNA sequences require assembling several oligonucleotides³. As an alternative to the gold standard chemical approach, methods for enzymatic synthesis are being developed by companies such as Ansa Biotechnologies and DNA Script, whose solutions are based on template-independent terminal deoxynucleotidyl transferase (TdT) for DNA synthesis. These technologies promise higher quality, with DNA Script having reached 99.7% and Camena Bioscience having reached 99.9% coupling efficiency, but also increased speed for producing synthesized DNA⁴. Successful commercialization will, however, still require reaching a feasible scale and throughput.

As successful DNA assembly is required in order to produce longer DNA fragments from oligos, extensive QC is required for chemical synthesizing technologies to reach high levels of accuracy. Assembly technologies and solutions from, among others, Codex DNA (Gibson assembly) can potentially streamline antibody discovery workflows by combining steps such as cloning, colony-picking, culture, prep, synthesis and assembly into one hardware solution. However, in chemical synthesis, larger aggregate numbers of assembled synthesized nucleotides inevitably lead to potential errors which highlights the need for effective QC.

Quality control of synthesized sequences

After successfully screening your antibody library and hit-picking your sequences, you’d normally synthesize your antibodies for epitope mapping, studying binding kinetics and more. With plenty of synthesizing options available, you are guaranteed to receive high-quality synthesized proteins. Nevertheless, as we pointed out above, the process is still imperfect and therefore oligo synthesizing equipment vendors or service providers disclose maximum error rates for synthesizing errors. Different vendors also have different processes for QC, which is likely also reflected in the price of the synthesized sequences.

We collected a subset of error rate examples from different providers and different products (gene synthesis, oligo pools) available in the market.

Provider	Product	Reported error rate	Unit	In % (3 decimals)
Twist Bioscience	Oligo Pools	<1 : 2,000	nt	0.050%
Codex DNA	BioXp™ 3250	<1 : 14,000	nt	0.007%
ThermoFisher Scientific	GeneArt Strings DNA Fragments	<1 : 6,757	nt	0.015%
IDTDNA	gBlocks	<1 : 5,000	nt	0.020%

Table 1. A subset of synthesis service providers. Notably some of the services are not designed for perfect synthesis of sequences.

‍

With the reliability of the nucleotide sequences obtained from gene synthesis being inversely correlated with the number of nucleotides being synthesized and assembled, the more and longer sequences you have, the more likely it will be that these contain errors. Although VH and light chains (VL) of IgGs are relatively short, typically around 110 amino acids (or 330 base pairs) long, larger synthesized sequence libraries inevitably run the risk of containing errors, unless extensively quality controlled.

For example, assume we’ve hit-picked a 96-well plate of IgG variable regions of the heavy chain (VH) to synthesize. At the typical length of 330 nucleotides per VH, we will have a total of 31,680 nucleotides synthesized.

Sequences	96
Nts/seq	330
Total nt count	31,680

Table 2. Synthesizing a 96-well plate full of IgG VH sequences

‍

Depending on the applied methodology for synthesis and the associated QC before delivery, our de novo synthesized sequences could contain anywhere from 2 to 16 errors in our 96 sequences.

Error rate	Potential error count (nt)
0.050%	16
0.007%	2
0.015%	5
0.020%	6

Table 3. Maximum number of nucleotide errors per 96-well plate

‍

While there is no guarantee of how the errors are distributed across our sequences, if they are equally distributed with no single VH containing more than one error, anywhere from 2% to 16% of our sequences could contain an error.

Error count (nt)	Maximum error rate across 96 sequences
16	16.7%
2	2.1%
5	5.2%
6	6.3%

Table 4. Maximum error percentage on sequence-level in 96-well plate

‍

Concluding remarks

We are excited to see the next generation of gene synthesis technologies being commercialized and giving researchers access to faster and more affordable methods for producing synthesized DNA, which can be used in a number of different applications, of which one is therapeutic research. Despite the often extensive QC performed by vendors on synthesized DNA products, validating your antibody sequences by re-sequencing mitigates the risk of incorrect antibodies being evaluated in the downstream functional research and development cycle.

‍

References

S.L. Beaucage, M.H. Caruthers, Deoxynucleoside phosphoramidites—A new class of key intermediates for deoxypolynucleotide synthesis, Tetrahedron Letters, Volume 22, Issue 20, 1981,Pages 1859-1862, ISSN 0040-4039, https://doi.org/10.1016/S0040-4039(01)90461-7.
Masaki, Y., Onishi, Y. & Seio, K. Quantification of synthetic errors during chemical synthesis of DNA and its suppression by non-canonical nucleosides. Sci Rep 12, 12095 (2022). https://doi.org/10.1038/s41598-022-16222-2
Palluk, S., Arlow, D., de Rond, T. et al. De novo DNA synthesis using polymerase-nucleotide conjugates. Nat Biotechnol 36, 645–650 (2018). https://doi.org/10.1038/nbt.4173
Eisenstein, M. Enzymatic DNA synthesis enters new phase. Nat Biotechnol 38, 1113–1115 (2020). https://doi.org/10.1038/s41587-020-0695-9