Here we use the PipeBio Bioinformatics Cloud to analyse affibody non-antibody scaffolds from ERR3474167.fastq downloaded from the European Nucleotide Archive. These non-antibody scaffolds can make great therapeutics due to their small size although there can be tradeoffs.
Biologic drugs are increasingly becoming important as therapeutics for treatment of various diseases, including cancer, infectious and inflammatory diseases. Classical antibody scaffolds and structures are being challenged by smaller but equally potent molecules named “non-antibody scaffolds” which have a number of benefits over the large bulky IgG molecule. Non-antibody scaffolds are interesting as therapeutic drugs thus the rich interest in these scaffolds (Frejd, F. et al). Many different and interesting scaffolds exist but here we only focus on a few of these.
Figure 1. From Vazquez-Lombardi et al.
Traditionally, there has been very little interest in developing general software tools to cope with these non-antibody scaffolds and companies and academic research groups have often analysed data by hand or used internally developed software. Analysis of high throughput (NGS) sequencing data of these scaffolds has been a very challenging task.
Pipe | bio offers a very easy to use cloud based sequence repository and bioinformatics platform which can easily be configured to fit various annotation requirements for both antibodies and non-antibody scaffolds.
In this application note we have primarily focused on analysis of affibodies but the platform can very easily be configured to other scaffolds such as knottins, bicyclic peptides, DARPins etc.
Data and configuration
We have used the first 2 million affibody sequences from ERR3474167.fastq downloaded from the European Nucleotide Archive. Bioproject https://www.ebi.ac.uk/ena/browser/view/PRJEB33942 which have been sequenced on the Illumina MiSeq platform.
Before running the annotation pipeline, there is a one-time configuration of the required scaffold. A scaffold can be IgG, ScFV, nanobody, non-antibody scaffolds etc. and below we show a simplified example of an affibody scaffold. As part of the scaffold configuration it is also possible to specify any liabilities, disallowed frameshifts, stop codons, etc. and how this should be reported in a tabular output.
Multiple scaffolds can be configured allowing for different configurations.
Figure 2. Schematic view of the affibody scaffold. Scaffolds and associated rules can be customized in Pipe | bio to meet your organization’s needs.
The Pipe | bio platform has a large toolbox for analysing data and the use of those may be dependent on the biological application. Here we show a simple workflow where we have imported sequence data, annotated interesting regions, plotted different charts and clustered on the region of interest.
The initial output of the annotation pipeline is a result document which shows tabular information on the results aligned with the sequences represented in a graphical view. This enables the user to easily filter and visually inspect the data in great detail. The annotation results are accompanied with a graphs showing breakdown of identified liabilities and overall summary statistics.
Figure 3. Pie chart showing summary of the annotation pipeline and individual identified errors as a tabular representation. Annotated sequences are shown in the lower half of the screen with both tabular information as well as graphical sequence view.
For visual inspection and support of your analyses it is possible to plot various charts. All charts and analyses can be performed per annotated region or the full sequence. All chats are interactive and by clicking different regions of the chart will apply a relevant filter to the result table of both tabular and sequence data. For example, for synthetic scaffolds and affinity maturation it is very valuable to be able to click interesting codons in a codon usage plot or by clicking a certain sequence length in a length distribution chart.
A number of different charts are support and others can be added on request
- Codon usage
- Length distribution
- Sequence logo
- Amino acid heatmap
- And more
Below is an example of codon usage which is great for library QC. All cells are clickable and will then apply a filter to the result table.
Reducing data complexity by clustering is a great way to get a condensed overview of the data and reduce data redundancy.
In PipeBio the user is able to “slice and dice” and have different views on clustered data. In the following screenshots we only look at the overview of the clusters, but it is also possible to expand the content and look into more details of the individual sub-clusters.
From the 2 million annotated sequence and using 85% identity clustering, we find 4651 clusters in total. The largest cluster has 328,492 sequences comprising 108,498 unique sequences. There is at most 255 identical sequences in that cluster indicating a very high diversity.
For the largest cluster it is very easy to see that there is a high variability in position 10, 18, 28, 35 as seen in the bar chart.
Cherry pick alternative-scaffolds to the cart
Use the sequence cart to cherry pick interesting sequences and clones and store them for later use or download them directly.
Customise your Sequence Store for alternative-antibody scaffolds
After cherry picking it may be interesting to query to the Sequence Store which is a repository of all the sequences you have analysed before. That way you can very quickly identify if you have analysed identical sequences before and in which documents they are found.
This can also be used, as example, to store patent sequences and other data from public sources. Then it is very easy and quick to look up if the sequences you are currently analysing has already been found in the public domain.
A rich integrated Bioinformatics suite for Antibody and Antibody-like drug discovery
There is a lot which is not described here and more is being added all the time.
- API for integration with other systems
- Merge paired-end NGS data
- Screen immune repertoires to extract variants having potential in-vitro maturation sites and residues
- Compare multiple samples, eg. enrichment, panning or to improve potency
- Subtract one sample from another
- Labeling of sequences
- And a lot more
Vazquez-Lombardi, R., Phan, T. G., Zimmermann, C., Lowe, D., Jermutus, L., & Christ, D. (2015). Challenges and opportunities for non-antibody scaffold drugs. Drug Discovery Today, 20(10), 1271–1283. https://doi.org/10.1016/j.drudis.2015.09.004
Frejd, F., Kim, K. Affibody molecules as engineered protein drugs. Exp Mol Med 49, e306 (2017). https://doi.org/10.1038/emm.2017.35