Single cell sequencing and data analysis on PipeBio

Introduction

The high-throughput assays enabled by multiplexing has opened up a world of interesting applications, as the technology has become a standard. We have previously given an overview of single-cell sequencing technologies in antibody research. We will analyze data from King et al. in the 2021 paper: Single-cell analysis of human B cell maturation predicts how antibody class switching shapes selection dynamics. We will provide an overview of the Chromium platform by 10x Genomics, as well as details on how to get your 10x data ready for analysis.

‍

Index:

Single-cell applications in immunology
BCR repertoire analysis on PipeBio
Appendix 1. FAQ: 10x data on PipeBio
Appendix 2. 10x Genomics technology and Cell Ranger outputs
References

‍

Applications in immunology

Availability of high-throughput single-cell sequencing and RNA-seq have prompted a range of studies in BCR and TCR and repertoire profiling in humans, mice, non-human primates, including deeper analysis of B-cell maturation and class switching . Studies such one from Setliff et al. (2019) have linked single B-cells to antigen specificity by first sorting antigen-stained B-cells, then tagging both the antigen and the BCRs with the same barcode and finally creating a score for specificity from sequence counts.

‍

A later study from the same lab by Shiakolas et al. (2021) amended the method with ligand-blocking information to both bind and block the target antigen. Both studies are summarized in a previous blog post from us. Analyzing the relationship between single-cell transcriptomics and receptor repertoires by mapping B cell or T cell clonotype functionality with BCR and TCR receptor sequencing has also garnered interest among researchers .

‍

Another recent development in high-throughput single cell analysis potential a further enhancement to 10x Genomics’ Single Cell ATAC (Assay for Transposase Accessible Chromatin). The method, PHAGE-ATAC, developed by Fiskin et al. (2022), allows higher throughput simultaneous single cell measurements by using phage displayed nanobodies as reagents (more in our previous article).

‍

Needless to say, single cell sequencing has much more yet to contribute to clinical therapies. Next we will dive into the details of the data that 10x Genomics libraries and NGS sequencing yields.

‍

Preparing 10x Genomics sequence data for analysis

After library preparation and a run on the Chromium chip, the 10x barcoded and UMI-tagged samples can be sequenced by supported Illumina NovaSeq, HiSeq, NextSeq or MiSeq sequencers. During an Illumina sequencing run, the sequencers will yield base call (BCL) files, which contain per-cycle data. These files need to subsequently be converted into the more usable FASTQ file format, which contain per read data. This can be done with Illumina’s bcl2fastq conversion software or directly with cellranger mkfastq, which also uses bcl2fastq for conversion and demultiplexing. You can upload demultiplexed data or fastq files without the associated data, in case you have performed cell-sorting prior to sequencing.

‍

A closer look at B-cell stages on PipeBio

In their 2021 paper, King et al. investigate the different stages of B-cell development in tonsils; from naive B-cells through to the affinity maturation phase in germinal centers to differentiation into memory B-cells or plasmablasts. In the study, B-cells from each stage were obtained from 9 donors, FACS-sorted and UMI-indexed with 10x Genomics Chromium and subsequently sequenced both for single-cell transcriptomics and in bulk antibody repertoires.

‍

We decided to analyze some of this data on the PipeBio bioinformatics platform and will now take a deeper look at one of the donors’ tonsillar antibody repertoire. For this analysis, we used the unprocessed antibody repertoire data from patient BCP1, a total of 10,204,106 sequences.

Cell type	Sequences	Correct	Incorrect	% of Correct by cell type
Naive	1,139,019	788,565	350,454	11.12%
Germinal center	3,581,487	2,549,839	1,031,648	35.95%
Memory B-cell	1,350,308	892,062	458,246	12.58%
Plasmablast	2,440,487	1,719,199	721,288	24.24%
Total	1,692,805	1,142,467	550,338	16.11%
Sequences total	10,204,106	7,092,132	3,111,974	100.00%

Table 1. Sequences containing stop-codons in regions, missing regions, frameshifts were classified as ‘Incorrect’

‍

After importing and merging paired-end reads, we re-annotated the data in PipeBio, which produced 7,092,132 correctly annotated reads (without frameshifts, stop codons and major liabilities).

‍

The isotype distribution of the sequences for this patient confirms, as expected, that the sequences from naive B-cells are predominantly unswitched, mostly displaying the IgM isotype. Furthermore, the germinal center and memory B-cells contain both switched and unswitched sequences, whereas the sequences from plasmablasts are predominantly switched IgG and IgA isotypes.

Figure of iImmunoglobulin isotype count by cell stage in tonsil immune cell repertoire — Figure 1. Isotype distribution across B-cell stages in individual BCP1

We also decided to investigate patterns of isotype switching for individual antibody sequences in the dataset. First, we compared the largest clusters (clustered at 90% identical AA-sequences) to see clusters for isotype-switched sequences from naive cells to germinal centers to either plasmablasts or memory B-cells. We performed a differential enrichment analysis on clusters across samples to find statistically significant (FDR-corrected p-value >0.05) fold-changes for clusters. In the volcano plots below, we can see the fold changes between differentiation stages.

Patterns of isotype switching for individual antibodies in B cell repertoire — Figure 2. Fold changes in the comparison samples for Naive, GerminalCenter, Plasmablast and Memory

As expected, the fold changes were greatest for isotypes IgG1, IgG2, IgG3, IgA1 and specific IgM clusters when comparing germinal centers and plasmablasts. Comparing germinal centers and memory B-cells, specific IgM clusters as well as IgG1, IgG2, IgG3 and IgD had the greatest fold changes.

We also clustered the patient’s sequences at 90% identity of CDR-H3 to visualize the overall isotype landscape of this dataset. The subset ‘Total’ (B-cells FACS sorted only on CD19⁺, present in all cells) was excluded, since sequences from these cells did not contain information on the presumed cell stage. We identified a cluster containing multiple isotypes: IgM, IgA1 and IgA2.

‍

Cluster diagram of immunoglobulin isotypes by cell stage in antibody repertoire, including germinal center, memory b cell, plasmablast and naive b cell — Figure 3. The cluster view chart was used for overlaying isotype, cluster size and the cell type

‍

This cluster contained at least one cluster from each cell stage, denoted by the shapes of the cluster symbols. The combined cluster contained 3,113 sequences in total, predominantly of isotype IgM, indicated by the red color.

‍

Cluster diagram showing predominant IgM and a singleton IgA1-cluster in single b cell repertoire subset — Figure 4. The cluster shows predominant IgM and a singleton IgA1-cluster

‍

Focusing in on the subcluster, we identified and quantified the present isotypes per cell stage:

Cell stage	IgM	IgA1	IgA2	IgG3	IgD	In total
Plasmablast	1	21				22
Memory	152					152
Naive	1					1
GerminalCenter	2,683	102	104	22	27	2,938
In total	2,837	123	104	22	27	3,113

Table 2. Isotype distribution in subcluster 152606

‍

The most diverse set of isotypes were, as expected, found in the germinal center, while the other IgA1 isotypes were also found in a plasmablast sequence subcluster.

‍

Additionally filtering the remaining 3,113 sequences by a V-gene identity of 97.5% provided us with 229 sequences and an interesting subset of both IgM and IgA1 isotypes in germinal center, memory and plasmablast B-cells.

‍

An antibody sequence alignment of IgM and IgA1 immunoglobulins — Figure 5. An alignment of the filtered sequences with variant amino acids highlighted

‍

In this small subset of sequences, as expected, most diversity could be found in the class-switched IgA1, while it is actually the conserved Alanine in the CDR-H3 that determines the isotype at the conjunction of the germinal center and plasmablast in the graph.

Returning to cluster 152606, we also performed a sequence alignment on the 22 sequences (see Table 2.) sequenced originating from plasmablasts. The subset contained sequences with isotype IgA1 and a single IgM isotypes. Figure 6. below shows us the phylogenetic tree of the sequences.

Sequence alignment of a plasmablast subset from cluster 152606 — Figure 6. Alignment of the plasmablast subset from cluster 152606

‍

From the sequence alignment above, we can observe a variety of sequences of the class-switched IgA1 isotype and one sequence of isotype IgM. The data used in our analysis is sufficient to describe some relationships between individual sequences and clusters of sequences, but combining the data with transcriptomics data would provide deeper knowledge of class-switching.

For a thorough transcriptomic analysis of single B-cell maturation and class switching, we recommend reading the original article by King et al. (2021).

‍

In conclusion

While combining transcriptomics data and sequence data is required for deep examination of the cell stages and the predictive capability of gene expression, repertoire analysis remains important and provides valuable insights into the development of the antibody repertoire in humans. Having the right tools for analyzing and visualizing the data makes the process faster, easier and repeatable.

For more information on how to import 10x data to PipeBio, see Appendix 1 at the end of this post.

If you’re looking to analyze antibody or BCR repertoire data, you can request to start a trial with us today.

‍

Appendix 1: 10x data on PipeBio.

What data to upload to PipeBio?

We recommend uploading demultiplexed data to PipeBio. Any Cell Ranger pipeline involving vdj will give you an all_contig.fasta file containing contigs and an all_contig_annotations.csv file containing associated barcodes, UMI counts and annotations. To keep all single-cell data connected with your sequences, upload the demultiplexed all_contig-file to PipeBio, followed by the annotations csv-file.

‍

When you upload the csv-file to PipeBio, you only have to select the correct mapping (contig_id <-> name) and you’re good to go.

QC of 10x data on PipeBio

While Cell Ranger already incorporates several QC steps, you also have the option to perform QC on PipeBio by labeling sequences with

Read counts below threshold
UMI counts below threshold
Specific chains.

Running the pipeline will give you additional data with which you can filter and select sequences for further analysis. You can also choose to initially keep all sequences and filter out data later during the analysis.

‍

Appendix 2: 10x Genomics and Cell Ranger

The single-cell technology by 10x Genomics

The single cell library prep and barcoding technology by 10x Genomics allows analyzing up to hundreds of thousands single cells in single runs. By multiplexing cells, that is labeling single cells and then pooling them together, samples can be analyzed all in one batch. The result of all of this is a high-throughput, more affordable method to get multidimensional data on single cells, including

‍

Full-length V(D)J sequences for paired B-cell or T-cell receptors (BCRs and TCRs)
Cell surface protein expression
Antigen specificity
Gene expression

‍

The unique barcodes in each gel bead-in-emulsion (GEM) allow tying each data point to a single cell. By additionally labeling each transcript within a single cell with a unique molecular identifier (UMI), it is possible to analyze gene expression counts per cell, control for quality based on expression, and correct for amplification bias.

Cell Ranger pipelines and outputs

Cell Ranger – the analysis pipeline software by 10x – has a variety of different processing pipelines for distinct library types (e.g. antibody capture, gene expression) and the outputs vary accordingly. The pipelines in Cell Ranger essentially take as input

the libraries used and

Cell ranger fastq files, sample and library type

the feature reference sheet, containing information about barcode mapping, barcode sequences, how to extract the barcode sequence from the read and and feature type

Cell Ranger feature reference sheet with ID, name, read, pattern, barcode sequence and feature type

‍

Depending on the pipeline you’re running (vdj, count or multi), the Cell Ranger will demultiplex and analyze your data on V(D)J repertoires, cell surface proteins, antigen specificity, or gene expression. In brief, what Cell Ranger to demultiplex sequences is

determine whether the sequences are TCR or BCR
use barcodes and UMIs to assemble contigs of V(D)J transcripts per cell
align them to a reference sequence
output aggregate files that include UMI counts per cell, feature cell matrices and barcoded contigs

‍

After demultiplexing on Cell Ranger, you are then able to take output files and upload them to PipeBio for further analysis.

References

Lee, R.D., Munro, S.A., Knutson, T.P. et al. Single-cell analysis identifies dynamic gene expression networks that govern B cell development and transformation. Nat Commun 12, 6843 (2021). https://doi.org/10.1038/s41467-021-27232-5
Evan S. Walsh, Tammy S. Tollison, Hayden N. Brochu, Brian I. Shaw, Kayleigh R. Diveley, Hsuan Chou, Lynn Law, Allan D. Kirk, Michael Gale Jr. and Xinxia Peng. Single-Cell–Based High-Throughput Ig and TCR Repertoire Sequencing Analysis in Rhesus Macaques. J Immunol February 1, 2022, 208 (3) 762-771; DOI: https://doi.org/10.4049/jimmunol.2100824
King HW, Orban N, Riches JC, Clear AJ, Warnes G, Teichmann SA, James LK. Single-cell analysis of human B cell maturation predicts how antibody class switching shapes selection dynamics. Sci Immunol. 2021 Feb 12;6(56):eabe6291. doi: 10.1126/sciimmunol.abe6291. PMID: 33579751.
Bashford-Rogers, R., Bergamaschi, L., McKinney, E. F., Pombal, D. C., Mescia, F., Lee, J. C., Thomas, D. C., Flint, S. M., Kellam, P., Jayne, D., Lyons, P. A., & Smith, K. (2019). Analysis of the B cell receptor repertoire in six immune-mediated diseases. Nature, 574(7776), 122–126. https://doi.org/10.1038/s41586-019-1595-3
Setliff, I., Shiakolas, A. R., Pilewski, K. A., Murji, A. A., Mapengo, R. E., Janowska, K., ... & Georgiev, I. S. (2019). High-throughput mapping of B cell receptor sequences to antigen specificity. Cell, 179(7), 1636-1646.
Shiakolas, A.R., Kramer, K.J., Johnson, N.V. et al. Efficient discovery of SARS-CoV-2-neutralizing antibodies via B cell receptor sequencing and ligand blocking. Nat Biotechnol (2022). https://doi.org/10.1038/s41587-022-01232-2
Zhang, Z., Xiong, D., Wang, X. et al. Mapping the functional landscape of T cell receptor repertoires by single-T cell transcriptomics. Nat Methods 18, 92–99 (2021). https://doi.org/10.1038/s41592-020-01020-3
Zhang, Z., Chang, W.Y., Wang, K. et al. Interpreting the B-cell receptor repertoire with single-cell gene expression using Benisse. Nat Mach Intell (2022). https://doi.org/10.1038/s42256-022-00492-6
Sebastiaan Valkiers, Nicky de Vrij, Sofie Gielis, Sara Verbandt, Benson Ogunjimi, Kris Laukens, Pieter Meysman, Recent advances in T-cell receptor repertoire analysis: Bridging the gap with multimodal single-cell RNA sequencing, ImmunoInformatics, Volume 5, 2022, 100009, ISSN 2667-1190, https://doi.org/10.1016/j.immuno.2022.100009.
Fiskin, E., Lareau, C.A., Ludwig, L.S. et al. Single-cell profiling of proteins and chromatin accessibility using PHAGE-ATAC. Nat Biotechnol 40, 374–381 (2022). https://doi.org/10.1038/s41587-021-01065-5
https://www.ebi.ac.uk/biostudies/ArrayExpress/studies/E-MTAB-8999/sdrf
The cells were FACS sorted for the following gene expressions: total (CD19+ ), naïve (CD19+ IgD+ CD38− ), germinal center (CD19+ IgD− CD38+ ), memory (CD19+ IgD− CD38− ), and plasmablasts (CD19+ IgD− CD38++)
10x Genomics: Assembly Algorithm. https://support.10xgenomics.com/single-cell-vdj/software/pipelines/latest/algorithms/assembly