PipeBio supports alternative scaffolds like affibodies, bi-cyclic peptides and knottins, as well as more conventional sequences like VHH, IgG, scFv.

Here we use the PipeBio Bioinformatics Cloud to analyse affibody non-antibody scaffolds from ERR3474167.fastq downloaded from the European Nucleotide Archive. These non-antibody scaffolds can make great therapeutics due to their small size although there can be tradeoffs.


Biologic drugs are increasingly becoming important as therapeutics for treatment of various diseases, including cancer, infectious and inflammatory diseases. Classical antibody scaffolds and structures are being challenged by smaller but equally potent molecules named “non-antibody scaffolds” which have a number of benefits over the large bulky IgG molecule. Non-antibody scaffolds are interesting as therapeutic drugs thus the rich interest in these scaffolds (Frejd, F. et al). Many different and interesting scaffolds exist but here we only focus on a few of these.

Figure 1. From Vazquez-Lombardi et al.

Traditionally, there has been very little interest in developing general software tools to cope with these non-antibody scaffolds and companies and academic research groups have often analysed data by hand or used internally developed software. Analysis of high throughput (NGS) sequencing data of these scaffolds has been a very challenging task.

Pipe | bio offers a very easy to use cloud based sequence repository and bioinformatics platform which can easily be configured to fit various annotation requirements for both antibodies and non-antibody scaffolds. 

In this application note we have primarily focused on analysis of affibodies but the platform can very easily be configured to other scaffolds such as knottins, bicyclic peptides, DARPins etc. 

Data and configuration

We have used the first 2 million affibody sequences from ERR3474167.fastq downloaded from the European Nucleotide Archive. Bioproject which have been sequenced on the Illumina MiSeq platform. 

Scaffold configuration

Before running the annotation pipeline, there is a one-time configuration of the required scaffold. A scaffold can be IgG, ScFV, nanobody, non-antibody scaffolds etc. and below we show a simplified example of an affibody scaffold. As part of the scaffold configuration it is also possible to specify any liabilities, disallowed frameshifts, stop codons, etc. and how this should be reported in a tabular output.  

Multiple scaffolds can be configured allowing for different configurations. 

Figure 2. Schematic view of the affibody scaffold. Scaffolds and associated rules can be customized in Pipe | bio to meet your organization’s needs.

Analysis pipeline

The Pipe | bio platform has a large toolbox for analysing data and the use of those may be dependent on the biological application. Here we show a simple workflow where we have imported sequence data, annotated interesting regions, plotted different charts and clustered on the region of interest. 

Annotation results 

The initial output of the annotation pipeline is a result document which shows tabular information on the results aligned with the sequences represented in a graphical view. This enables the user to easily filter and visually inspect the data in great detail. The annotation results are accompanied with a graphs showing breakdown of identified liabilities and overall summary statistics.

Figure 3. Pie chart showing summary of the annotation pipeline and individual identified errors as a tabular representation. Annotated sequences are shown in the lower half of the screen with both tabular information as well as graphical sequence view.  


For visual inspection and support of your analyses it is possible to plot various charts. All charts and analyses can be performed per annotated region or the full sequence. All chats are interactive and by clicking different regions of the chart will apply a relevant filter to the result table of both tabular and sequence data. For example, for synthetic scaffolds and affinity maturation it is very valuable to be able to click interesting codons in a codon usage plot or by clicking a certain sequence length in a length distribution chart. 

A number of different charts are support and others can be added on request

  • Codon usage
  • Length distribution
  • Sequence logo
  • Amino acid heatmap
  • And more
“Make chart” dialog from PipeBio Bioinformatic Cloud tools

Below is an example of codon usage which is great for library QC. All cells are clickable and will then apply a filter to the result table. 

Codon usage plot in PipeBio Bioinformatics Cloud Platform.


Reducing data complexity by clustering is a great way to get a condensed overview of the data and reduce data redundancy. 

In PipeBio the user is able to “slice and dice” and have different views on clustered data. In the following screenshots we only look at the overview of the clusters, but it is also possible to expand the content and look into more details of the individual sub-clusters. 

From the 2 million annotated sequence and using 85% identity clustering, we find 4651 clusters in total. The largest cluster has 328,492 sequences comprising 108,498 unique sequences. There is at most 255 identical sequences in that cluster indicating a very high diversity.

Clustering view in PipeBio sorted by the largest clusters at the top. The top pane showing an amino acid bar chart where it is very easy to identify four variable positions.  
For the largest cluster it is very easy to see that there is a high variability in position 10, 18, 28, 35 as seen in the bar chart. 

Cherry pick alternative-scaffolds to the cart

Use the sequence cart to cherry pick interesting sequences and clones and store them for later use or download them directly. 

Hit-pick / cherry-pick diverse antibody scaffolds sequences to the cart.
Right click in any document within PipeBio to add interesting sequences to the cart.

Customise your Sequence Store for alternative-antibody scaffolds

After cherry picking it may be interesting to query to the Sequence Store which is a repository of all the sequences you have analysed before. That way you can very quickly identify if you have analysed identical sequences before and in which documents they are found. 

This can also be used, as example, to store patent sequences and other data from public sources. Then it is very easy and quick to look up if the sequences you are currently analysing has already been found in the public domain. 

Customise your Sequence Store repository for your alternative antibody scaffolds.
Sequence Store showing antibody CDR-H3 sequences and labels. 

A rich integrated Bioinformatics suite for Antibody and Antibody-like drug discovery

There is a lot which is not described here and more is being added all the time. 

  • API for integration with other systems
  • Merge paired-end NGS data
  • Screen immune repertoires to extract variants having potential in-vitro maturation sites and residues
  • Compare multiple samples, eg. enrichment, panning or to improve potency
  • Subtract one sample from another
  • Reporting
  • Labeling of sequences
  • Cloning
  • And a lot more


