Single-cell sequencing and analyzing 10x data

Sequence clusters displaying cell stages and isotypes for class-switching analysis

How to analyze single B-cell 10x Genomics NGS data on PipeBio

The high-throughput assays enabled by multiplexing has opened up a world of interesting applications, as the technology has become a standard. We have previously given an overview of single-cell sequencing technologies in antibody research and in this article, we focus on other aspects of single-cell sequence analysis from 10x Genomics. We also analyze single B-cell sequences from the paper Single-cell analysis of human B cell maturation predicts how antibody class switching shapes selection dynamics by King et al. (2021)[1]. In appendices 1 and 2 we provide an overview of the Chromium platform by 10x Genomics, as well as details on how to get your 10x data ready for analysis on PipeBio.

Index

Applications in immunology

Availability of high-throughput single-cell sequencing and RNA-seq have prompted a range of studies in  BCR and TCR and repertoire profiling in humans, mice[2], non-human primates[3], including deeper analysis of B-cell maturation and class switching[1] [4]. Studies such as one from Setliff et al. (2019) have linked single B-cells to antigen specificity by first sorting antigen-stained B-cells, then tagging both the antigen and the BCRs with the same barcode and finally creating a score for specificity from sequence counts[5].

A later study from the same lab by Shiakolas et al. (2021) amended the method with ligand-blocking information to both bind and block the target antigen[6]. Both studies are summarized in a previous blog post from us. Analyzing the relationship between single-cell transcriptomics and receptor repertoires by mapping B cell or T cell clonotype functionality with BCR and TCR receptor sequencing has also garnered interest among researchers[7] [8] [9].

Another recent development in high-throughput single cell analysis potential a further enhancement to 10x Genomics’ Single Cell ATAC (Assay for Transposase Accessible Chromatin). The method, PHAGE-ATAC, developed by Fiskin et al. (2022)[10], allows higher throughput simultaneous single cell measurements by using phage displayed nanobodies as reagents (more in our previous article).

Needless to say, single cell sequencing has much more yet to contribute to clinical therapies. Next we will dive into the details of the data that 10x Genomics libraries and NGS sequencing yields.

Preparing 10x Genomics sequence data for analysis

After library preparation and a run on the Chromium chip, the 10x barcoded and UMI-tagged samples can be sequenced by supported Illumina NovaSeq, HiSeq, NextSeq or MiSeq sequencers. During an Illumina sequencing run, the sequencers will yield base call (BCL) files, which contain per-cycle data. These files need to subsequently be converted into the more usable FASTQ file format, which contain per read data. This can be done with Illumina’s bcl2fastq conversion software or directly with cellranger mkfastq, which also uses bcl2fastq for conversion and demultiplexing. You can upload demultiplexed data onto PipeBio or FASTQ files without the associated data, in case you have performed cell-sorting prior to sequencing. However, it is also important to consider amplification bias arising from PCR in droplet-based sequencing, which is often corrected by collapsing reads with identical or nearly identical UMIs. Especially with low input amounts of DNA or RNA, there is a risk for obtaining non-representative repertoires[11, 12].

A closer look at B-cell stages on PipeBio

In their 2021 paper, King et al. investigate the different stages of B-cell development in tonsils; from naive B-cells through to the affinity maturation phase in germinal centers to differentiation into memory B-cells or plasmablasts. In the study, B-cells from each stage were obtained from 9 donors, sorted by gene expression[A] and UMI-indexed with 10x Genomics Chromium and subsequently sequenced both for single-cell transcriptomics and in bulk antibody repertoires.


We decided to analyze some of this data on the PipeBio bioinformatics platform and will now take a deeper look at one of the donors’ tonsillar antibody repertoire. For this analysis, we used the unprocessed antibody repertoire data from patient BCP1[B], a total of 10,204,106 sequences. As FACS was used to sort the cells prior to sequencing, each sample’s cell stage was known.

Cell stageSequencesCorrectIncorrect% of ‘Correct’ by cell type
Naive1,139,019788,565350,45411.12%
Germinal center3,581,4872,549,8391,031,64835.95%
Memory B-cell1,350,308892,062458,24612.58%
Plasmablast2,440,4871,719,199721,28824.24%
Total1,692,8051,142,467550,33816.11%
Sequences total10,204,1067,092,1323,111,974100%
Table 1. Sequences containing stop-codons in regions, missing regions, frameshifts were classified as ‘Incorrect’

After importing and merging paired-end reads, we re-annotated the data in PipeBio, which produced 7,092,132 correctly annotated reads (without frameshifts, stop codons and major liabilities). Sequences that could not be annotated were omitted from further analyses.

The isotype distribution of the sequences for this patient confirms, as expected, that the sequences from naive B-cells are predominantly unswitched, mostly displaying the IgM isotype. Furthermore, the germinal center and memory B-cells contain both switched and unswitched sequences, whereas the sequences from plasmablasts are predominantly switched IgG and IgA isotypes.

Figure 1. Isotype distribution across B-cell stages in individual BCP1

We also decided to investigate patterns of isotype switching for individual antibody sequences in the dataset. First, we compared the largest clusters (clustered at 90% identical AA-sequences) to see clusters for isotype-switched sequences from naive cells to germinal centers to either plasmablasts or memory B-cells. We performed a differential enrichment analysis on clusters across samples to find statistically significant (FDR-corrected p-value >0.05) fold-changes for clusters. In the volcano plots below, we can see the fold changes between differentiation stages.

Figure 2. Fold changes in the comparison samples for Naive, GerminalCenter, Plasmablast and Memory

As expected, the fold changes were greatest for isotypes IgG1, IgG2, IgG3, IgA1 and specific IgM clusters when comparing germinal centers and plasmablasts. Comparing germinal centers and memory B-cells, specific IgM clusters as well as IgG1, IgG2, IgG3 and IgD had the greatest fold changes.


We also clustered the patient’s sequences at 90% identity of CDR-H3 to visualize the overall isotype landscape of this dataset. The subset ‘Total’ (B-cells FACS sorted only on CD19+, present in all cells) was excluded, since sequences from these cells did not contain information on the presumed cell stage. We identified a cluster containing multiple isotypes: IgM, IgA1 and IgA2.

Cluster chart for visualization of b-cell repertoire landscape
Figure 3. The cluster view chart was used for overlaying isotype, cluster size and the cell type

This cluster contained at least one cluster from each cell stage, denoted by the shapes of the cluster symbols. The combined cluster contained 3,113 sequences in total, predominantly of isotype IgM, indicated by the red color.

Cluster chart displaying clusters of antibody isotypes and cell stages
Figure 4. The cluster shows predominant IgM and a singleton IgA1-cluster

Focusing in on the subcluster, we identified and quantified the present isotypes per cell stage:

Isotype /
Cell stage
IgMIgA1IgA2IgG3IgDIn total
Plasmablast12122
Memory152152
Naive11
GerminalCenter2,68310210422272,938
In total2,83712310422273,113
Table 2. Sequence counts by isotype and cell stage in subcluster 152606

The most diverse set of isotypes were, as expected, found in the germinal center, while the other IgA1 isotypes were also found in a plasmablast sequence subcluster. 

Additionally filtering the remaining 3,113 sequences by a V-gene identity of 97.5% provided us with 229 sequences and an interesting subset of both IgM and IgA1 isotypes in germinal center, memory and plasmablast B-cells.

Sequence alignment of a subcluster with IgM and IgA1 isotypes in germinal center, plasmablast and memory B-cells
Figure 5. An alignment of the filtered sequences with a V-gene identity of 97.5%, variant amino acids highlighted

In this small subset of sequences, as expected, most diversity could be observed in the class-switched IgA1. We can also observe a mutated alanine in the CDR-H3 region determines the isotype at the conjunction of the germinal center and plasmablast in the graph.

Returning to cluster 152606, we also performed a sequence alignment on the 22 sequences (see Table 2.) sequenced originating from plasmablasts. The subset contained sequences with isotype IgA1 and a single IgM isotypes. Figure 6. below shows us the phylogenetic tree of the sequences.

Figure 6. Alignment of the plasmablast subset from cluster 152606

From the sequence alignment above, we can observe a variety of sequences of the class-switched IgA1 isotype and one sequence of isotype IgM. The data used in our analysis is sufficient to describe some relationships between individual sequences and clusters of sequences, but combining the data with transcriptomics data would provide deeper knowledge of class-switching. For a thorough transcriptomic analysis of single B-cell maturation and class switching, we recommend reading the original article by King et al. (2021).

In conclusion

While combining transcriptomics data and sequence data is required for deep examination of the cell stages and the predictive capability of gene expression, repertoire analysis remains important and provides valuable insights into the development of the antibody repertoire in humans. Having the right tools for analyzing and visualizing the data makes the process faster, easier and repeatable. For more information on how to import 10x data to PipeBio, see Appendix 1 below.

If you’re looking to analyze antibody or BCR repertoire data, you can request to start a trial with us today.

Want to analyze 10x Genomics data on PipeBio?

You can register for a free trial here


Appendix 1: 10x data on PipeBio.

What data to upload to PipeBio?

We recommend uploading demultiplexed data to PipeBio. Any Cell Ranger pipeline involving vdj will give you an all_contig.fasta file containing contigs and an all_contig_annotations.csv file containing associated barcodes, UMI counts and annotations. To keep all single-cell data connected with your sequences, upload the demultiplexed all_contig-file to PipeBio, followed by the annotations csv-file.

When you upload the csv-file to PipeBio, you only have to select the correct mapping (contig_id <-> name) and you’re good to go.

QC of 10x data on PipeBio

While Cell Ranger already incorporates several QC steps, you also have the option to perform QC on PipeBio by labeling sequences with 

  • Read counts below threshold
  • UMI counts below threshold
  • Specific chains. 

Running the pipeline will give you additional data with which you can filter and select sequences for further analysis. You can also choose to initially keep all sequences and filter out data later during the analysis.

Appendix 2: 10x Genomics and Cell Ranger

The single-cell technology by 10x Genomics

The single cell library prep and barcoding technology by 10x Genomics allows analyzing up to hundreds of thousands single cells in single runs. By multiplexing cells, that is labeling single cells and then pooling them together, samples can be analyzed all in one batch. The result of all of this is a high-throughput, more affordable method to get multidimensional data on single cells, including

  • Full-length V(D)J sequences for paired B-cell or T-cell receptors (BCRs and TCRs)
  • Cell surface protein expression
  • Antigen specificity
  • Gene expression

The unique barcodes in each gel bead-in-emulsion (GEM) allow tying each data point to a single cell. By additionally labeling each transcript within a single cell with a unique molecular identifier (UMI), it is possible to analyze gene expression counts per cell, control for quality based on expression, and correct for amplification bias.

Single-cell barcoding technology by 10x Genomics
Image: The microfluidics-based technology from 10x Genomics

Cell Ranger pipelines and outputs

Cell Ranger – the analysis pipeline software by 10x – has a variety of different processing pipelines for distinct library types (e.g. antibody capture, gene expression) and the outputs vary accordingly. The pipelines in Cell Ranger essentially take as input:

1. the libraries used and

Output files VDJ Cell Ranger analysis pipeline

2. the feature reference sheet, containing information about barcode mapping, barcode sequences, how to extract the barcode sequence from the read and and feature type

Output files VDJ Cell Ranger analysis pipeline

Depending on the pipeline you’re running (vdj, count or multi), the Cell Ranger will demultiplex and analyze your data on V(D)J repertoires, cell surface proteins, antigen specificity, or gene expression[C]. In brief, what Cell Ranger to demultiplex sequences is

  1. determine whether the sequences are TCR or BCR 
  2. use barcodes and UMIs to assemble contigs of V(D)J transcripts per cell
  3. align them to a reference sequence
  4. output aggregate files that include UMI counts per cell, feature cell matrices and barcoded contigs

After demultiplexing on Cell Ranger, you are then able to take output files and upload them to PipeBio for further analysis.

References

[1] King HW, Orban N, Riches JC, Clear AJ, Warnes G, Teichmann SA, James LK. Single-cell analysis of human B cell maturation predicts how antibody class switching shapes selection dynamics. Sci Immunol. 2021 Feb 12;6(56):eabe6291. doi: 10.1126/sciimmunol.abe6291. PMID: 33579751.

[2] Lee, R.D., Munro, S.A., Knutson, T.P. et al. Single-cell analysis identifies dynamic gene expression networks that govern B cell development and transformation. Nat Commun 12, 6843 (2021). https://doi.org/10.1038/s41467-021-27232-5

[3] Evan S. Walsh, Tammy S. Tollison, Hayden N. Brochu, Brian I. Shaw, Kayleigh R. Diveley, Hsuan Chou, Lynn Law, Allan D. Kirk, Michael Gale Jr. and Xinxia Peng. Single-Cell–Based High-Throughput Ig and TCR Repertoire Sequencing Analysis in Rhesus Macaques. J Immunol February 1, 2022, 208 (3) 762-771; DOI: https://doi.org/10.4049/jimmunol.2100824

[4] Bashford-Rogers, R., Bergamaschi, L., McKinney, E. F., Pombal, D. C., Mescia, F., Lee, J. C., Thomas, D. C., Flint, S. M., Kellam, P., Jayne, D., Lyons, P. A., & Smith, K. (2019). Analysis of the B cell receptor repertoire in six immune-mediated diseases. Nature, 574(7776), 122–126. https://doi.org/10.1038/s41586-019-1595-3

[5] Setliff I, Shiakolas AR, Pilewski KA, Murji AA, Mapengo RE, Janowska K, Richardson S, Oosthuysen C, Raju N, Ronsard L, Kanekiyo M, Qin JS, Kramer KJ, Greenplate AR, McDonnell WJ, Graham BS, Connors M, Lingwood D, Acharya P, Morris L, Georgiev IS. High-Throughput Mapping of B Cell Receptor Sequences to Antigen Specificity. Cell. 2019 Dec 12;179(7):1636-1646.e15. doi: 10.1016/j.cell.2019.11.003. Epub 2019 Nov 28. PMID: 31787378; PMCID: PMC7158953.

[6] Shiakolas, A.R., Kramer, K.J., Johnson, N.V. et al. Efficient discovery of SARS-CoV-2-neutralizing antibodies via B cell receptor sequencing and ligand blocking. Nat Biotechnol (2022). https://doi.org/10.1038/s41587-022-01232-2

[7] Zhang, Z., Xiong, D., Wang, X. et al. Mapping the functional landscape of T cell receptor repertoires by single-T cell transcriptomics. Nat Methods 18, 92–99 (2021). https://doi.org/10.1038/s41592-020-01020-3

[8] Zhang, Z., Chang, W.Y., Wang, K. et al. Interpreting the B-cell receptor repertoire with single-cell gene expression using Benisse. Nat Mach Intell (2022). https://doi.org/10.1038/s42256-022-00492-6

[9] Sebastiaan Valkiers, Nicky de Vrij, Sofie Gielis, Sara Verbandt, Benson Ogunjimi, Kris Laukens, Pieter Meysman, Recent advances in T-cell receptor repertoire analysis: Bridging the gap with multimodal single-cell RNA sequencing, ImmunoInformatics, Volume 5, 2022, 100009, ISSN 2667-1190, https://doi.org/10.1016/j.immuno.2022.100009.

[10] Fiskin, E., Lareau, C.A., Ludwig, L.S. et al. Single-cell profiling of proteins and chromatin accessibility using PHAGE-ATAC. Nat Biotechnol 40, 374–381 (2022). https://doi.org/10.1038/s41587-021-01065-5

[11] Barennes, P., Quiniou, V., Shugay, M. et al. Benchmarking of T cell receptor repertoire profiling methods reveals large systematic biases. Nat Biotechnol 39, 236–245 (2021). https://doi.org/10.1038/s41587-020-0656-3

[12] Chen, YJ., Takahashi, C.N., Organick, L. et al. Quantifying molecular bias in DNA data storage. Nat Commun 11, 3264 (2020). https://doi.org/10.1038/s41467-020-16958-3

Footnotes

[A] The cells were sorted by flow cytometry for the following gene expressions: total (CD19+ ), naïve (CD19+ IgD+ CD38− ), germinal center (CD19+ IgD− CD38+ ), memory (CD19+ IgD− CD38− ), and plasmablasts (CD19+ IgD− CD38++)

[B] https://www.ebi.ac.uk/biostudies/ArrayExpress/studies/E-MTAB-8999/sdrf

[C] 10x Genomics: Assembly Algorithm. https://support.10xgenomics.com/single-cell-vdj/software/pipelines/latest/algorithms/assembly