Canonical cysteines are the conserved cysteine residues found in the core of Ig domains, which connect the polypeptide chains of an antibody molecule encoded by variable and constant gene segments.2 However, a subset of cysteine residues, known as non-canonical cysteines, are typically encoded in human by diversity gene segments, primarily IGHD2 and other D gene families.1
Non-canonical cysteines are more prevalent in species such as chickens3, camels4, llamas5, sharks6, and cows7. Non-canonical cysteine residues participate in the formation of diverse intra-heavy chain disulfide bonds within the CDR-H3 of antibody variable domains.2 Furthermore, these non-canonical cysteines mediate disulfide bonding between the CDR-H3 loop and other CDRs or framework regions (FRs).8
The disulfide bridges resulting from non-canonical cysteine residues are involved in generating the substantial conformational diversity observed in antibody repertoires and conferring distinct structural conformations to antigen-binding sites. Moreover, these disulfide bonds have been implicated in modulating various effector functions of antibody molecules. The presence and positioning of non-canonical cysteines is a molecular determinant for shaping the functional capabilities of the antibody repertoire.8
Non-canonical cysteines in Bactrian camels
Camelids, such as camels and llamas, possess a unique ability to produce heavy-chain-only antibodies (VHHs) with an exceptionally long CDR-H3 region.9 This extended CDR-H3 loop can form non-canonical disulfide bonds, enabling recognition of unique epitopes that are inaccessible to conventional antibodies.1
In a study by Liu et al10., the canonical cysteines at positions 23 and 104 were found to be present in all VHs and VHHs. However, non-canonical cysteines were specifically enriched in VHHs. A comparison of the locations of these non-canonical cysteines revealed that they were primarily situated on CDR1 in the Bactrian camel and dromedary, but on FR2 (site 55) in the alpaca. Moreover, the proportion of VHHs harboring non-canonical cysteines was significantly higher in the Bactrian camel and dromedary than in the alpaca10.
Based on the positions of non-canonical cysteine residues, VHHs could be classified into eight distinct types, which exhibited varying distributions across the camelid species. Types 2b, 2c, and 3a were the predominant types in the Bactrian camel and dromedary, while Types 1 and 3b were dominant in the alpaca.10 The presence of non-canonical cysteines facilitates the formation of additional disulfide bonds, thereby enhancing the structural diversity of VHHs.10 This effect is more profound in the Bactrian camel and dromedary compared to the alpaca.10
Notably, the remarkable usage of non-canonical cysteines within VHHs was identified in the Bactrian camel, both at the germline and rearranged levels. These non-canonical cysteines are rarely observed in VHs and other non-camelid species.11 It is hypothesized that the cysteine residues on CDR3 may form disulfide bridges with the cysteine within FR2 or the cysteine on CDR1, resulting in a special disulfide bond configuration.11,12
Disulfide bonds formed by cysteine residues impact protein folding. The additional non-canonical cysteines introduced by heavy-chain antibodies (HCAbs) could lead to novel loop conformations and thereby increase the structural diversity and stabilize the VHH domain.13 Additionally, compared to the alpaca, the Bactrian camel and dromedary exhibited significantly more and distinct non-canonical cysteines, implying a greater diversity of HCAb structures in these two species.10
The introduction of non-canonical cysteines, along with extended CDR3 lengths and increased hypermutation hotspots, may contribute to the acquisition of a diverse antigen-binding repertoire in HCAbs, compensating for the absence of light chains.13,14
Non-canonical disulfide bonds across Camelids
One prominent example is the presence of non-canonical disulfide bonds between cysteine residues outside the traditional antibody framework regions. It occurs in the dromedary camel (Camelus dromedarius), where the most common non-canonical disulfide bond links the complementarity determining regions CDR1 and CDR3, with the cysteine on CDR1 (position 33) being germline encoded.12 The prevalence of this structural motif suggests it plays an important role in the structure and/or function of the HCAbs encoded by this germline gene segment.
Through systematic investigations on a set of five camel VHH domains (the single N-terminal variable domains of HCAbs), two key hypotheses have emerged regarding the potential functional significance of the CDR1-CDR3 disulfide bond15:
Antigen Binding Affinity Hypothesis
It is proposed that the CDR1-CDR3 disulfide bond contributes to the antigen binding affinity of HCAbs/VHH by reducing entropic penalties. The CDR3 loops in camelid VHH average three residues longer than those of conventional antibodies.16 By constraining the conformational flexibility of these elongated CDR3 loops, the non-canonical disulfide bond could lower the entropic cost associated with loop immobilization upon antigen binding. However, experimental evidence indicates the effects on affinity are variable across different VHH, rather than being a universal entropy-driven effect.15
Thermal Stability Adaptation Hypothesis
The second hypothesis suggests an evolutionary role where the CDR1-CDR3 disulfide bond enhances the biophysical properties of the VHH domain to prevent heat-induced aggregation in the absence of a light chain partner.15 This disulfide cross-link may represent a key adaptation allowing camels to survive the extreme desert climates and high body temperatures of their natural habitat.17 Camels are known to conserve water by elevating their internal body temperature as high as 40°C during the day and only dissipating heat at night to avoid water loss from evaporation.18
Experiments monitoring the aggregation propensity of a camel VHH (R303) and a variant lacking the CDR1-CDR3 bond (R303C33A/C102A) at moderate temperatures support the notion that this disulfide enhances the reversible folding and solubility of the isolated VHH domain under denaturing thermal conditions seen in the camel's body.16 While not conclusive, these results indicate the thermal stability hypothesis is plausible.
The CDR1-CDR3 disulfide bonding pattern is almost exclusively found in Camelus dromedarius among camelid species, while llamas and alpacas more commonly exhibit non-canonical bonds between CDR2 and CDR3.19,20,21 Furthermore, around 30% of camel VHH sequences contain the germline-encoded cysteine at position 33 that facilitates CDR1-CDR3 bonding, compared to just ~12% in alpacas and ~3.5% in llamas.15
This phylogenetic distribution suggests the CDR1-CDR3 disulfide bond represents a specialized adaptation of camels compared to other camelids.15 The higher frequency in camels may reflect an evolutionary response to the more extreme desert environments and heat stress faced by this particular species within the Camelidae family.15
Overview across other species
As non-canonical cysteines in CDR-H3s can form intra- or inter-chain disulfide bonds they can stabilize unique conformations and create novel paratope shapes.2 The antibodies of chickens, known as immunoglobulin Y (IgY)22, are renowned for their high abundance of non-canonical cysteines in CDR-H3s.23 These cysteines form various intra-CDR-H3 disulfide bonds, contributing to the structural diversity and antigen-binding capabilities of chicken antibodies.24
In sharks, the exceptional diversity of non-canonical cysteines in their antibody repertoire is an evolutionary adaptation to the aquatic environment.6 Shark antibodies, known as IgNARs (immunoglobulin new antigen receptors), possess long CDR-H3s with multiple non-canonical cysteines, forming intricate disulfide bond networks that contribute to their structural stability and antigen-binding capabilities.8
Cows also exhibit a remarkable diversity of non-canonical cysteines in their antibody repertoire, with some antibodies containing up to 28 cysteines in their CDR-H3s.7 These ultra-long CDR-H3s, stabilized by an array of disulfide bonds, are thought to play a role in recognizing complex antigens, such as carbohydrate epitopes found on pathogens.11
Rabbit IgG antibodies frequently contain non-canonical disulfide bonds between the CDRH1 and CDRH2, as well as between the kappa light chain and CDRH1/CDRH2.28 These non-canonical disulfides are often viewed as liabilities for therapeutic antibody development from rabbits.27 The kappa light chains of rabbits can also contain an intra-chain disulfide between cysteines 80 and 171 in the dominating K1 isotype.26 This poses challenges when making chimeric rabbit/human Fabs, as human CH1 domains lack cysteine 171. Rabbits without this K1 intrachain disulfide show higher sequence diversity and affinity of chimeric Fabs.26 Some rabbit allotypes like b9 have an alternative intrachain disulfide in the kappa light chain between cysteines 108 and 171 instead. The high diversity of rabbit light chains is functionally important for antigen binding.26
Non-canonical cysteines in human repertoire
The landscape of non-canonical cysteines in the human antibody repertoire has been elucidated through the analysis of large next-generation sequencing (NGS) datasets. By analyzing nearly 3 billion VH sequences from ten individuals, Prabakaran et al.1 identified 12 million unique VH sequences containing non-canonical cysteines in CDR-H3s. These findings challenge the prevailing notion that non-canonical cysteines are rare or absent in human antibodies. The number of non-canonical cysteines in CDR-H3s ranged from one to eight, with two cysteines being the most prevalent. Surprisingly, higher numbers of non-canonical cysteines, typically associated with antibodies from other species like chicken, shark, and cow, were also observed in human CDR-H3s.1 Such non-canonical cysteines created diverse patterns and motifs involving contiguous duplets, triplets, and even septuplets, which were previously unknown in humans.1
These non-canonical cysteine motifs in human CDR-H3s display remarkable diversity, with over 4,000 unique patterns identified.1 The CXnC motif, where Xn represents the number of amino acids separating the two cysteines, was found to be the most prevalent, with the CX4C motif accounting for nearly 75% of all CXnC motifs. 34,266 unique tetrapeptides were embedded within the CX4C motifs, highlighting the sequence diversity.1
The presence of non-canonical cysteines in human CDR-H3s is reminiscent of the patterns observed in other species, suggesting an evolutionary relationship between the VHs of humans and those of lower animals. For instance, the two-cysteine CXnC motifs found in human CDR-H3s resemble those observed in chicken, camel, llama, shark, and cow antibodies.8 However, the diversity and complexity of non-canonical cysteine motifs in human CDR-H3s appear to be more extensive, encompassing a broader range of patterns and potential disulfide bonds.1