Limits...
MHC genotyping of non-model organisms using next-generation sequencing: a new methodology to deal with artefacts and allelic dropout.

Sommer S, Courtiol A, Mazzoni CJ - BMC Genomics (2013)

Bottom Line: We developed the analytical methodology and validated a data processing procedure which can be applied to any organism.It allows the separation of true alleles from artefacts and the evaluation of genotyping reliability, which in addition to artefacts considers for the first time the possibility of allelic dropout due to unbalanced amplification efficiencies across alleles.Combining our workflow with the study of amplification efficiency offers the chance for researchers to evaluate enormous amounts of NGS-generated data in great detail, improving confidence over the downstream analyses and subsequent applications.

View Article: PubMed Central - HTML - PubMed

Affiliation: Evolutionary Genetics, Leibniz-Institute for Zoo and Wildlife Research, Alfred-Kowalke-Straße 17, D-10315 Berlin, Germany. sommer@izw-berlin.de

ABSTRACT

Background: The Major Histocompatibility Complex (MHC) is the most important genetic marker to study patterns of adaptive genetic variation determining pathogen resistance and associated life history decisions. It is used in many different research fields ranging from human medical, molecular evolutionary to functional biodiversity studies. Correct assessment of the individual allelic diversity pattern and the underlying structural sequence variation is the basic requirement to address the functional importance of MHC variability. Next-generation sequencing (NGS) technologies are likely to replace traditional genotyping methods to a great extent in the near future but first empirical studies strongly indicate the need for a rigorous quality control pipeline. Strict approaches for data validation and allele calling to distinguish true alleles from artefacts are required.

Results: We developed the analytical methodology and validated a data processing procedure which can be applied to any organism. It allows the separation of true alleles from artefacts and the evaluation of genotyping reliability, which in addition to artefacts considers for the first time the possibility of allelic dropout due to unbalanced amplification efficiencies across alleles. Finally, we developed a method to assess the confidence level per genotype a-posteriori, which helps to decide which alleles and individuals should be included in any further downstream analyses. The latter method could also be used for optimizing experiment designs in the future.

Conclusions: Combining our workflow with the study of amplification efficiency offers the chance for researchers to evaluate enormous amounts of NGS-generated data in great detail, improving confidence over the downstream analyses and subsequent applications.

Show MeSH

Related in: MedlinePlus

Differences in the individual number of alleles detected by cloning/Sanger (blue bars) and 454 pyrosequencing (red bars).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3750822&req=5

Figure 3: Differences in the individual number of alleles detected by cloning/Sanger (blue bars) and 454 pyrosequencing (red bars).

Mentions: The subsequent workflow (Figure 2) allowed us to distinguish most of the filtered reads (63,091 out of 63,166 reads, 99.88%) into ‘putative artefacts’ (33,400 or ~53% of filtered reads) or ‘putative alleles’ (29,691 or ~47% of filtered reads). The remaining 75 reads were marked as ‘unclassified variants’, as they did not fulfil all assumptions to be called a ‘putative allele’ but could not be classified as ‘putative artefacts’. We checked the classification of each ‘unclassified variant’ among amplicons and detected three variants classified in at least one amplicon as a ‘putative allele’, but that were more often defined as ‘unclassified variant’ because of either lower frequency compared to ‘putative artefacts’ or absence in one individual’s amplicon replicate. The frequency inconsistencies of the latter variants suggest lower amplification efficiency when compared to other alleles. The presence of all three variants could be confirmed by Sanger sequencing after designing new allele-specific primers. We accepted those three variants as true alleles, but labelled them as ‘putative low efficiency alleles’. Those three alleles corresponded to two additional amino acid sequences (Desu-DRB*009, Desu-DRB*053). We found a total of 64 unique nucleotide true MHC variants, which translated into 57 putative MHC alleles on the amino acid level (Additional file 1: Figure S2). The ‘putative alleles’ were called MHC alleles for simplicity even though it was not possible to assign them to a particular locus. The alleles were denominated as Desu-DRB*X, where X corresponds to an allele number between 01 and 124 (GenBank under accession No’s KF134719-KF134782) according to the nomenclature (Klein et al. 1990). Without ‘putative low efficiency alleles’ between two and nine (5.4 ± 1.5) alleles were detected per individual, suggesting at least 5 copies of DRB in Delomys sublineatus. If the ‘putative low efficiency alleles’ are taken into account, the range of alleles detected by 454 pyrosequencing increases to 3-11 per individual (Figure 3).


MHC genotyping of non-model organisms using next-generation sequencing: a new methodology to deal with artefacts and allelic dropout.

Sommer S, Courtiol A, Mazzoni CJ - BMC Genomics (2013)

Differences in the individual number of alleles detected by cloning/Sanger (blue bars) and 454 pyrosequencing (red bars).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3750822&req=5

Figure 3: Differences in the individual number of alleles detected by cloning/Sanger (blue bars) and 454 pyrosequencing (red bars).
Mentions: The subsequent workflow (Figure 2) allowed us to distinguish most of the filtered reads (63,091 out of 63,166 reads, 99.88%) into ‘putative artefacts’ (33,400 or ~53% of filtered reads) or ‘putative alleles’ (29,691 or ~47% of filtered reads). The remaining 75 reads were marked as ‘unclassified variants’, as they did not fulfil all assumptions to be called a ‘putative allele’ but could not be classified as ‘putative artefacts’. We checked the classification of each ‘unclassified variant’ among amplicons and detected three variants classified in at least one amplicon as a ‘putative allele’, but that were more often defined as ‘unclassified variant’ because of either lower frequency compared to ‘putative artefacts’ or absence in one individual’s amplicon replicate. The frequency inconsistencies of the latter variants suggest lower amplification efficiency when compared to other alleles. The presence of all three variants could be confirmed by Sanger sequencing after designing new allele-specific primers. We accepted those three variants as true alleles, but labelled them as ‘putative low efficiency alleles’. Those three alleles corresponded to two additional amino acid sequences (Desu-DRB*009, Desu-DRB*053). We found a total of 64 unique nucleotide true MHC variants, which translated into 57 putative MHC alleles on the amino acid level (Additional file 1: Figure S2). The ‘putative alleles’ were called MHC alleles for simplicity even though it was not possible to assign them to a particular locus. The alleles were denominated as Desu-DRB*X, where X corresponds to an allele number between 01 and 124 (GenBank under accession No’s KF134719-KF134782) according to the nomenclature (Klein et al. 1990). Without ‘putative low efficiency alleles’ between two and nine (5.4 ± 1.5) alleles were detected per individual, suggesting at least 5 copies of DRB in Delomys sublineatus. If the ‘putative low efficiency alleles’ are taken into account, the range of alleles detected by 454 pyrosequencing increases to 3-11 per individual (Figure 3).

Bottom Line: We developed the analytical methodology and validated a data processing procedure which can be applied to any organism.It allows the separation of true alleles from artefacts and the evaluation of genotyping reliability, which in addition to artefacts considers for the first time the possibility of allelic dropout due to unbalanced amplification efficiencies across alleles.Combining our workflow with the study of amplification efficiency offers the chance for researchers to evaluate enormous amounts of NGS-generated data in great detail, improving confidence over the downstream analyses and subsequent applications.

View Article: PubMed Central - HTML - PubMed

Affiliation: Evolutionary Genetics, Leibniz-Institute for Zoo and Wildlife Research, Alfred-Kowalke-Straße 17, D-10315 Berlin, Germany. sommer@izw-berlin.de

ABSTRACT

Background: The Major Histocompatibility Complex (MHC) is the most important genetic marker to study patterns of adaptive genetic variation determining pathogen resistance and associated life history decisions. It is used in many different research fields ranging from human medical, molecular evolutionary to functional biodiversity studies. Correct assessment of the individual allelic diversity pattern and the underlying structural sequence variation is the basic requirement to address the functional importance of MHC variability. Next-generation sequencing (NGS) technologies are likely to replace traditional genotyping methods to a great extent in the near future but first empirical studies strongly indicate the need for a rigorous quality control pipeline. Strict approaches for data validation and allele calling to distinguish true alleles from artefacts are required.

Results: We developed the analytical methodology and validated a data processing procedure which can be applied to any organism. It allows the separation of true alleles from artefacts and the evaluation of genotyping reliability, which in addition to artefacts considers for the first time the possibility of allelic dropout due to unbalanced amplification efficiencies across alleles. Finally, we developed a method to assess the confidence level per genotype a-posteriori, which helps to decide which alleles and individuals should be included in any further downstream analyses. The latter method could also be used for optimizing experiment designs in the future.

Conclusions: Combining our workflow with the study of amplification efficiency offers the chance for researchers to evaluate enormous amounts of NGS-generated data in great detail, improving confidence over the downstream analyses and subsequent applications.

Show MeSH
Related in: MedlinePlus