Limits...
MHC genotyping of non-model organisms using next-generation sequencing: a new methodology to deal with artefacts and allelic dropout.

Sommer S, Courtiol A, Mazzoni CJ - BMC Genomics (2013)

Bottom Line: We developed the analytical methodology and validated a data processing procedure which can be applied to any organism.It allows the separation of true alleles from artefacts and the evaluation of genotyping reliability, which in addition to artefacts considers for the first time the possibility of allelic dropout due to unbalanced amplification efficiencies across alleles.Combining our workflow with the study of amplification efficiency offers the chance for researchers to evaluate enormous amounts of NGS-generated data in great detail, improving confidence over the downstream analyses and subsequent applications.

View Article: PubMed Central - HTML - PubMed

Affiliation: Evolutionary Genetics, Leibniz-Institute for Zoo and Wildlife Research, Alfred-Kowalke-Straße 17, D-10315 Berlin, Germany. sommer@izw-berlin.de

ABSTRACT

Background: The Major Histocompatibility Complex (MHC) is the most important genetic marker to study patterns of adaptive genetic variation determining pathogen resistance and associated life history decisions. It is used in many different research fields ranging from human medical, molecular evolutionary to functional biodiversity studies. Correct assessment of the individual allelic diversity pattern and the underlying structural sequence variation is the basic requirement to address the functional importance of MHC variability. Next-generation sequencing (NGS) technologies are likely to replace traditional genotyping methods to a great extent in the near future but first empirical studies strongly indicate the need for a rigorous quality control pipeline. Strict approaches for data validation and allele calling to distinguish true alleles from artefacts are required.

Results: We developed the analytical methodology and validated a data processing procedure which can be applied to any organism. It allows the separation of true alleles from artefacts and the evaluation of genotyping reliability, which in addition to artefacts considers for the first time the possibility of allelic dropout due to unbalanced amplification efficiencies across alleles. Finally, we developed a method to assess the confidence level per genotype a-posteriori, which helps to decide which alleles and individuals should be included in any further downstream analyses. The latter method could also be used for optimizing experiment designs in the future.

Conclusions: Combining our workflow with the study of amplification efficiency offers the chance for researchers to evaluate enormous amounts of NGS-generated data in great detail, improving confidence over the downstream analyses and subsequent applications.

Show MeSH

Related in: MedlinePlus

454 reads filtering: pyrosequencing data quality assessment. Pathway illustrating initial filtering steps to ensure data quality of reads obtained by 454 pyrosequencing and to facilitate the subsequent allele and artefact identification workflow. The number of reads included in a filtering step is indicated on the left of each arrow and the percentage of the initial number of reads in brackets. The number of discarded reads is shown on the right of each arrow.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3750822&req=5

Figure 1: 454 reads filtering: pyrosequencing data quality assessment. Pathway illustrating initial filtering steps to ensure data quality of reads obtained by 454 pyrosequencing and to facilitate the subsequent allele and artefact identification workflow. The number of reads included in a filtering step is indicated on the left of each arrow and the percentage of the initial number of reads in brackets. The number of discarded reads is shown on the right of each arrow.

Mentions: We included all 40 individuals subjected to cloning/Sanger sequencing in duplicates (amplicon replicates) in one region (1/8th) of a 454 FLX Titanium picotiter plate. All 80 amplicons were barcoded (Additional file 1: Figure S1). All terms used to describe the subsequent results are outlined in Table 1. We obtained 86,153 sequence reads passing the filters of the GS Run Processor (Figure 1). 81,309 reads had the correct length with recognizable f-and r-MIDs, from which read numbers per amplicon ranged from 311 to 2276 (1028 ± 387) and from 1187 to 3193 per individual (summing both amplicon replicates). A total of 63,166 (73.3%) reads passed the initial filtering steps by showing the expected read length, complete primer sequences, high quality and no frameshifts (Figure 1).


MHC genotyping of non-model organisms using next-generation sequencing: a new methodology to deal with artefacts and allelic dropout.

Sommer S, Courtiol A, Mazzoni CJ - BMC Genomics (2013)

454 reads filtering: pyrosequencing data quality assessment. Pathway illustrating initial filtering steps to ensure data quality of reads obtained by 454 pyrosequencing and to facilitate the subsequent allele and artefact identification workflow. The number of reads included in a filtering step is indicated on the left of each arrow and the percentage of the initial number of reads in brackets. The number of discarded reads is shown on the right of each arrow.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3750822&req=5

Figure 1: 454 reads filtering: pyrosequencing data quality assessment. Pathway illustrating initial filtering steps to ensure data quality of reads obtained by 454 pyrosequencing and to facilitate the subsequent allele and artefact identification workflow. The number of reads included in a filtering step is indicated on the left of each arrow and the percentage of the initial number of reads in brackets. The number of discarded reads is shown on the right of each arrow.
Mentions: We included all 40 individuals subjected to cloning/Sanger sequencing in duplicates (amplicon replicates) in one region (1/8th) of a 454 FLX Titanium picotiter plate. All 80 amplicons were barcoded (Additional file 1: Figure S1). All terms used to describe the subsequent results are outlined in Table 1. We obtained 86,153 sequence reads passing the filters of the GS Run Processor (Figure 1). 81,309 reads had the correct length with recognizable f-and r-MIDs, from which read numbers per amplicon ranged from 311 to 2276 (1028 ± 387) and from 1187 to 3193 per individual (summing both amplicon replicates). A total of 63,166 (73.3%) reads passed the initial filtering steps by showing the expected read length, complete primer sequences, high quality and no frameshifts (Figure 1).

Bottom Line: We developed the analytical methodology and validated a data processing procedure which can be applied to any organism.It allows the separation of true alleles from artefacts and the evaluation of genotyping reliability, which in addition to artefacts considers for the first time the possibility of allelic dropout due to unbalanced amplification efficiencies across alleles.Combining our workflow with the study of amplification efficiency offers the chance for researchers to evaluate enormous amounts of NGS-generated data in great detail, improving confidence over the downstream analyses and subsequent applications.

View Article: PubMed Central - HTML - PubMed

Affiliation: Evolutionary Genetics, Leibniz-Institute for Zoo and Wildlife Research, Alfred-Kowalke-Straße 17, D-10315 Berlin, Germany. sommer@izw-berlin.de

ABSTRACT

Background: The Major Histocompatibility Complex (MHC) is the most important genetic marker to study patterns of adaptive genetic variation determining pathogen resistance and associated life history decisions. It is used in many different research fields ranging from human medical, molecular evolutionary to functional biodiversity studies. Correct assessment of the individual allelic diversity pattern and the underlying structural sequence variation is the basic requirement to address the functional importance of MHC variability. Next-generation sequencing (NGS) technologies are likely to replace traditional genotyping methods to a great extent in the near future but first empirical studies strongly indicate the need for a rigorous quality control pipeline. Strict approaches for data validation and allele calling to distinguish true alleles from artefacts are required.

Results: We developed the analytical methodology and validated a data processing procedure which can be applied to any organism. It allows the separation of true alleles from artefacts and the evaluation of genotyping reliability, which in addition to artefacts considers for the first time the possibility of allelic dropout due to unbalanced amplification efficiencies across alleles. Finally, we developed a method to assess the confidence level per genotype a-posteriori, which helps to decide which alleles and individuals should be included in any further downstream analyses. The latter method could also be used for optimizing experiment designs in the future.

Conclusions: Combining our workflow with the study of amplification efficiency offers the chance for researchers to evaluate enormous amounts of NGS-generated data in great detail, improving confidence over the downstream analyses and subsequent applications.

Show MeSH
Related in: MedlinePlus