Immunoprofiling: How it works

Posted from Discovering Biology in a Digital World by Todd Smith on Wed Dec 05, 2018

As previously discussed, immunoprofiling is the quantitative measurement of antigen receptors (ARs; antibodies or T-cell receptors) in a sample. Massively parallel (NextGen) DNA sequencing is a commonly used method for this purpose because receptor diversity can be quickly and cost-effectively measured with the added benefit that individual receptors can be quantified.  

AR diversity is a result of random recombination plus DNA base insertions

To understand the basis and power of DNA sequencing assays, we need to understand the basics of AR development and maturation. The adaptive immune systems of all jawed vertebrates are similar in that an immense AR diversity is created through a DNA rearrangement process [1] that recombines genes within different groups together. Each receptor locus has discrete groups of genes that are called Variable (V), Diversity (D), and Joining (J). There are also Constant (C) genes, but for the purpose of antigen recognition the V(D)J genes form the "business" end of the AR. Hence this post focuses on V(D)J recombination and does to discuss C genes and class switching. 

Human AR loci. The genes for each AR chain exist in separate locations (loci) in the genome. Color shades are used to indicate the different gene groups (V: red, D: green, J: yellow, and C: blue). General gene lengths (in nucleotides [nt, bases] are given below the gene group names. The number of genes for each group is indicated below each locus diagram. The chromosome (chr) where the loci are located is indicated at the end of each diagram. Data for the figure were obtained from the IMGT database: Nov 14, 2018

In the AR recombination process a gene in each gene group (V, D, of J) is combined with a gene in another group. As an example, an antibody (BCR) is created from one heavy chain molecule (IGH) and one light chain molecule (kappa [IGK] or lambda [IGL]). The heavy chain has V, D, and J genes, whereas the light chain loci only has V and J genes. In the IGH recombination process, a D gene is combined with a J gene and the resulting DJ gene is combined with a V gene. Light chain recombination simply combines a V gene with a J gene. Because the process is random, the number of possible VDJ or VJ genes is the product of the number of V, D and J genes, or V and J genes, respectively. The total diversity is the product of the VDJ and IGL VJ combinations plus the product of the VDJ and IGK VJ combinations (BCRs are dimers of one IGH and either an IGL, or an IGK chain). 

When the above math is done the number of possible receptors is in the millions. While millions seems big, it is actually small when one considers that number of antigens that can be recognized is limitless. How is this possible? The simple answer is that the recombination process is “sloppy.” During recombination the gene segments are brought together via a protein complex that places the genes together and loops out the intervening DNA [2]. The loop is cleaved to create blunt terminal ends of DNA that are then joined together with enzymes that can add a variable number of random DNA bases at each V, D, or J junction to create a nearly limitless number of receptor sequences. 

The recombination process. For loci with D genes the first step (1) is to combine one D gene with one J gene. Next (2), a V gene is combined with the DJ gene to create a VDJ unit (3). The additional bases are indicated by pink bars between V-D-J junctions. As V genes also contain promoters for transcription, a pre-mRNA is made that has the VDJ unit, any extra J genes, an intron, and the adjacent C gene (4). The last step is to splice out the intron, and any "extra" J genes (5) to create the mature AR mRNA. 

Immunoprofiling samples the sequences of V(D)J regions

As one can expect, the V(D)J* junctions are the areas of highest diversity in ARs. The antigen recognition domains of the AR protein has three regions that interact with antigens. Also known as complementarity determining regions (CDRs), the first two, CDR1 and CDR2, are encoded by the V gene. CDR3 is encoded by the V(D)J junction region, and from an immunoprofiling perspective this is the most important region. As CDR3 segments are between 60 and 100 bases in length they are ideal candidates for massively parallel high-throughput short-read sequencing on the Illumina platform. Hence, immunoprofiling is a growth area in biotechnology

In immunoprofiling assays, the CDR3 segment is sequenced from DNA or RNA (converted to cDNA). In either case PCR is used to amplify DNA containing CDR3 using V gene and J gene primers**. Even though the combinations of V(D)J that are possible in a sample is large, the number of primers required is simply the total number of V and J genes for a given AR locus. For example, profiling the TCR beta receptors (above figure) requires 61 primers (48 V gene and 13 J gene). 

Despite the modest number of primers needed in an immunoprofiling assay the sequences that the primers bind to will result in significant differences in PCR amplification frequencies between individual receptor molecules. This is due to hybridization efficiency which is affect by the local DNA sequence. Primers that bind more efficiently result in greater amounts of amplified DNA. Thus to make immunoprofiling a quantitative assay, amplification differences need to be accounted for. 

To account for amplification differences, Adaptive Biotechnologies (Seattle WA) developed synthetic DNA molecules that contain the same V and J gene primer binding sequences as the receptors that are being sequenced [3]. In our TCR beta example, up to 624 synthetic DNAs are needed. The synthetic DNA molecules are added (spiked in) to assays at a defined concentration. Once sequencing is complete the final dataset will have some number of sequences, called reads, that match each synthetic DNA. The ratio of the number of reads to their corresponding number of spiked in molecules corresponds to the PCR amplification frequency for that primer sequence combination. As the reads derived from sample material will also have many, if not all, ot the same primer combinations - with very different “middle” sequences - we can apply the ratios determined from the synthetic DNA reads to normalize the data. 

Structure of V(D)J region and immunoprofiling. Either DNA or RNA (cDNA) can be sequenced with the same V and J gene primers. The bottom diagram shows the AR binding region. CDRs are flanked by Framework regions (FWR). V gene and J gene primers bind to the FWR3 (V) and FWR4 (J) regions, respectively. The pink regions indicate the additional bases that are added during recombination. 


DNA sequencing-based immunoprofiling quantitatively measures AR diversity in samples by determining the sequences of V(D)J junctions. AR receptor diversity is vast due to a combinatorial rearrangement process that inserts a variable number of random DNA bases at each junction. In the sequencing process V(D)J junctions are amplified with V and J gene specific primers and, to be quantitative, differences in amplification rates that are due to primer sequences must be factored into each assay. 

References / notes: 

* The (D) in V(D)J is to note that a D gene is it present in some chains (IGH, TCRB, TCRD) but not others (IGL, IGK, TCRA, TCRG). 

1. Litman GW, Rast JP, Fugmann SD. The origins of vertebrate adaptive immunityNat Rev Immunol. 2010;10(8):543-53.

2. Video on AR rearrangement -

3. Carlson CS, Emerson RO, Sherwood AM, Desmarais C, Chung MW, Parsons JM, Steen  MS, LaMadrid-Herrmannsfeldt MA, Williamson DW, Livingston RJ, Wu D, Wood BL,
Rieder MJ, Robins H. Using synthetic templates to design an unbiased multiplex PCR assay. Nat Commun. 2013;4:2680.

** There are other ways to prepare DNA for immunoprofiling that do not require individual primers. Such assays will have more steps that can introduce other kinds of artifacts, and (or) be limited to RNA sequencing.