Limits...
MHC genotyping of non-model organisms using next-generation sequencing: a new methodology to deal with artefacts and allelic dropout.

Sommer S, Courtiol A, Mazzoni CJ - BMC Genomics (2013)

Bottom Line: We developed the analytical methodology and validated a data processing procedure which can be applied to any organism.It allows the separation of true alleles from artefacts and the evaluation of genotyping reliability, which in addition to artefacts considers for the first time the possibility of allelic dropout due to unbalanced amplification efficiencies across alleles.Combining our workflow with the study of amplification efficiency offers the chance for researchers to evaluate enormous amounts of NGS-generated data in great detail, improving confidence over the downstream analyses and subsequent applications.

View Article: PubMed Central - HTML - PubMed

Affiliation: Evolutionary Genetics, Leibniz-Institute for Zoo and Wildlife Research, Alfred-Kowalke-Straße 17, D-10315 Berlin, Germany. sommer@izw-berlin.de

ABSTRACT

Background: The Major Histocompatibility Complex (MHC) is the most important genetic marker to study patterns of adaptive genetic variation determining pathogen resistance and associated life history decisions. It is used in many different research fields ranging from human medical, molecular evolutionary to functional biodiversity studies. Correct assessment of the individual allelic diversity pattern and the underlying structural sequence variation is the basic requirement to address the functional importance of MHC variability. Next-generation sequencing (NGS) technologies are likely to replace traditional genotyping methods to a great extent in the near future but first empirical studies strongly indicate the need for a rigorous quality control pipeline. Strict approaches for data validation and allele calling to distinguish true alleles from artefacts are required.

Results: We developed the analytical methodology and validated a data processing procedure which can be applied to any organism. It allows the separation of true alleles from artefacts and the evaluation of genotyping reliability, which in addition to artefacts considers for the first time the possibility of allelic dropout due to unbalanced amplification efficiencies across alleles. Finally, we developed a method to assess the confidence level per genotype a-posteriori, which helps to decide which alleles and individuals should be included in any further downstream analyses. The latter method could also be used for optimizing experiment designs in the future.

Conclusions: Combining our workflow with the study of amplification efficiency offers the chance for researchers to evaluate enormous amounts of NGS-generated data in great detail, improving confidence over the downstream analyses and subsequent applications.

Show MeSH

Related in: MedlinePlus

Standardised allele amplification efficiency. The amplification efficiency is estimated for each ‘putative allele’ by maximum likelihood and represented by a blue dot. Each dot is connected by a vertical line to the full horizontal line representing efficiency 1.0, which was defined using the first allele (Desu-DRB*001) as a reference (see text for details). The dashed horizontal line represents an efficiency of two, which could represent duplicated or homozygous alleles. ‘Low efficiency putative alleles’ are not represented (Desu-DRB*009a, Desu-DRB*053b, Desu-DRB*053c).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3750822&req=5

Figure 5: Standardised allele amplification efficiency. The amplification efficiency is estimated for each ‘putative allele’ by maximum likelihood and represented by a blue dot. Each dot is connected by a vertical line to the full horizontal line representing efficiency 1.0, which was defined using the first allele (Desu-DRB*001) as a reference (see text for details). The dashed horizontal line represents an efficiency of two, which could represent duplicated or homozygous alleles. ‘Low efficiency putative alleles’ are not represented (Desu-DRB*009a, Desu-DRB*053b, Desu-DRB*053c).

Mentions: We used a maximum likelihood approach to estimate the amplification efficiency of each allele (see Methods) and found that there is a substantial variation in the amplification efficiency among alleles (Figure 5). The lowest amplification efficiency was reached for the allele Desu-DRB*028, which presented an amplification efficiency of 0.19, i.e. more than five times lower than the allele Desu-DRB*001 used as a reference (amplification efficiency = 1) (Additional file 2: Table S2). Maximum amplification efficiency was reached for the allele Desu-DRB*091, which with an efficiency of 2.40 is more than 12 times more efficient than Desu-DRB*028. No other allele presented an amplification efficiency close or equal to two, suggesting that Desu-DRB*091 was the only one corresponding to either a duplicated allele in different loci or a homozygous locus. This allele was present in four individuals always as the most frequent one (Cluster1) (Additional file 2: Table S2), and the ratio in frequency compared to the second most frequent cluster (Cluster2) ranged in average from 1.9 to 3.9 fold for these individuals, reinforcing the hypothesis that this allele is always present in duplicate. Even when omitting Desu-DRB*091, the span in efficiency between alleles represented an 8-fold increase between the least and the most efficient alleles.


MHC genotyping of non-model organisms using next-generation sequencing: a new methodology to deal with artefacts and allelic dropout.

Sommer S, Courtiol A, Mazzoni CJ - BMC Genomics (2013)

Standardised allele amplification efficiency. The amplification efficiency is estimated for each ‘putative allele’ by maximum likelihood and represented by a blue dot. Each dot is connected by a vertical line to the full horizontal line representing efficiency 1.0, which was defined using the first allele (Desu-DRB*001) as a reference (see text for details). The dashed horizontal line represents an efficiency of two, which could represent duplicated or homozygous alleles. ‘Low efficiency putative alleles’ are not represented (Desu-DRB*009a, Desu-DRB*053b, Desu-DRB*053c).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3750822&req=5

Figure 5: Standardised allele amplification efficiency. The amplification efficiency is estimated for each ‘putative allele’ by maximum likelihood and represented by a blue dot. Each dot is connected by a vertical line to the full horizontal line representing efficiency 1.0, which was defined using the first allele (Desu-DRB*001) as a reference (see text for details). The dashed horizontal line represents an efficiency of two, which could represent duplicated or homozygous alleles. ‘Low efficiency putative alleles’ are not represented (Desu-DRB*009a, Desu-DRB*053b, Desu-DRB*053c).
Mentions: We used a maximum likelihood approach to estimate the amplification efficiency of each allele (see Methods) and found that there is a substantial variation in the amplification efficiency among alleles (Figure 5). The lowest amplification efficiency was reached for the allele Desu-DRB*028, which presented an amplification efficiency of 0.19, i.e. more than five times lower than the allele Desu-DRB*001 used as a reference (amplification efficiency = 1) (Additional file 2: Table S2). Maximum amplification efficiency was reached for the allele Desu-DRB*091, which with an efficiency of 2.40 is more than 12 times more efficient than Desu-DRB*028. No other allele presented an amplification efficiency close or equal to two, suggesting that Desu-DRB*091 was the only one corresponding to either a duplicated allele in different loci or a homozygous locus. This allele was present in four individuals always as the most frequent one (Cluster1) (Additional file 2: Table S2), and the ratio in frequency compared to the second most frequent cluster (Cluster2) ranged in average from 1.9 to 3.9 fold for these individuals, reinforcing the hypothesis that this allele is always present in duplicate. Even when omitting Desu-DRB*091, the span in efficiency between alleles represented an 8-fold increase between the least and the most efficient alleles.

Bottom Line: We developed the analytical methodology and validated a data processing procedure which can be applied to any organism.It allows the separation of true alleles from artefacts and the evaluation of genotyping reliability, which in addition to artefacts considers for the first time the possibility of allelic dropout due to unbalanced amplification efficiencies across alleles.Combining our workflow with the study of amplification efficiency offers the chance for researchers to evaluate enormous amounts of NGS-generated data in great detail, improving confidence over the downstream analyses and subsequent applications.

View Article: PubMed Central - HTML - PubMed

Affiliation: Evolutionary Genetics, Leibniz-Institute for Zoo and Wildlife Research, Alfred-Kowalke-Straße 17, D-10315 Berlin, Germany. sommer@izw-berlin.de

ABSTRACT

Background: The Major Histocompatibility Complex (MHC) is the most important genetic marker to study patterns of adaptive genetic variation determining pathogen resistance and associated life history decisions. It is used in many different research fields ranging from human medical, molecular evolutionary to functional biodiversity studies. Correct assessment of the individual allelic diversity pattern and the underlying structural sequence variation is the basic requirement to address the functional importance of MHC variability. Next-generation sequencing (NGS) technologies are likely to replace traditional genotyping methods to a great extent in the near future but first empirical studies strongly indicate the need for a rigorous quality control pipeline. Strict approaches for data validation and allele calling to distinguish true alleles from artefacts are required.

Results: We developed the analytical methodology and validated a data processing procedure which can be applied to any organism. It allows the separation of true alleles from artefacts and the evaluation of genotyping reliability, which in addition to artefacts considers for the first time the possibility of allelic dropout due to unbalanced amplification efficiencies across alleles. Finally, we developed a method to assess the confidence level per genotype a-posteriori, which helps to decide which alleles and individuals should be included in any further downstream analyses. The latter method could also be used for optimizing experiment designs in the future.

Conclusions: Combining our workflow with the study of amplification efficiency offers the chance for researchers to evaluate enormous amounts of NGS-generated data in great detail, improving confidence over the downstream analyses and subsequent applications.

Show MeSH
Related in: MedlinePlus