Limits...
The Influence of Hepatitis C Virus Genetic Region on Phylogenetic Clustering Analysis.

Lamoury FM, Jacka B, Bartlett S, Bull RA, Wong A, Amin J, Schinkel J, Poon AF, Matthews GV, Grebely J, Dore GJ, Applegate TL - PLoS ONE (2015)

Bottom Line: Our results demonstrated that the genomic region of HCV analysed influenced phylogenetic tree topology and clustering results.The HCV Core region alone was not suitable for clustering analysis; NS5B concatenation, the inclusion of reference sequences and removal of HVR1 all influenced clustering outcome.The Core-E2 region, which represented the highest genetic diversity and longest sequence length in this study, provides an ideal method for clustering analysis to address a range of molecular epidemiological questions.

View Article: PubMed Central - PubMed

Affiliation: The Kirby Institute, University of New South Wales Australia, Sydney, Australia.

ABSTRACT
Sequencing is important for understanding the molecular epidemiology and viral evolution of hepatitis C virus (HCV) infection. To date, there is little standardisation among sequencing protocols, in-part due to the high genetic diversity that is observed within HCV. This study aimed to develop a novel, practical sequencing protocol that covered both conserved and variable regions of the viral genome and assess the influence of each subregion, sequence concatenation and unrelated reference sequences on phylogenetic clustering analysis. The Core to the hypervariable region 1 (HVR1) of envelope-2 (E2) and non-structural-5B (NS5B) regions of the HCV genome were amplified and sequenced from participants from the Australian Trial in Acute Hepatitis C (ATAHC), a prospective study of the natural history and treatment of recent HCV infection. Phylogenetic trees were constructed using a general time-reversible substitution model and sensitivity analyses were completed for every subregion. Pairwise distance, genetic distance and bootstrap support were computed to assess the impact of HCV region on clustering results as measured by the identification and percentage of participants falling within all clusters, cluster size, average patristic distance, and bootstrap value. The Robinson-Foulds metrics was also used to compare phylogenetic trees among the different HCV regions. Our results demonstrated that the genomic region of HCV analysed influenced phylogenetic tree topology and clustering results. The HCV Core region alone was not suitable for clustering analysis; NS5B concatenation, the inclusion of reference sequences and removal of HVR1 all influenced clustering outcome. The Core-E2 region, which represented the highest genetic diversity and longest sequence length in this study, provides an ideal method for clustering analysis to address a range of molecular epidemiological questions.

No MeSH data available.


Related in: MedlinePlus

Clustering results among the 50 GT1a ATAHC sequences: genetic distance, percentage of sequences, tree, patristic distance and bootstrap values.Panel i: The genetic distance distribution is shown for both ATAHC sequences (dark colour) and Los Alamos HCV database reference sequences (clear colour). The vertical dotted lines represent the thresholds for clustering, which were estimated by determining the point of overlap/uncertainty region between the two curves of most-closely related (ATAHC sequences) and distantly related (both ATAHC and LANL sequences) for each HCV region. Panel ii shows the ATAHC clustering patterns using Cluster Picker with bootstrap support threshold fixed at 90% and maximum genetic distance threshold varied between 0.01 and 0.08 (colour lines: ATAHC sequences; grey lines: LANL reference sequences). Plain lines represent the percentage of clustered sequences; dot lines correspond to average cluster size. The vertical dot line indicates the clustering threshold (as per panel i) used to determine the percentage of clustered sequences and average cluster size (Table 1). Panel iii shows the phylloclade with participants highlighted when defined as part of a cluster with the clustering threshold (panel i) and bootstrap support above 90% criteria (Cluster Picker).
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4507989&req=5

pone.0131437.g006: Clustering results among the 50 GT1a ATAHC sequences: genetic distance, percentage of sequences, tree, patristic distance and bootstrap values.Panel i: The genetic distance distribution is shown for both ATAHC sequences (dark colour) and Los Alamos HCV database reference sequences (clear colour). The vertical dotted lines represent the thresholds for clustering, which were estimated by determining the point of overlap/uncertainty region between the two curves of most-closely related (ATAHC sequences) and distantly related (both ATAHC and LANL sequences) for each HCV region. Panel ii shows the ATAHC clustering patterns using Cluster Picker with bootstrap support threshold fixed at 90% and maximum genetic distance threshold varied between 0.01 and 0.08 (colour lines: ATAHC sequences; grey lines: LANL reference sequences). Plain lines represent the percentage of clustered sequences; dot lines correspond to average cluster size. The vertical dot line indicates the clustering threshold (as per panel i) used to determine the percentage of clustered sequences and average cluster size (Table 1). Panel iii shows the phylloclade with participants highlighted when defined as part of a cluster with the clustering threshold (panel i) and bootstrap support above 90% criteria (Cluster Picker).

Mentions: The Core-E2 region clustering threshold of 0.045 genetic distance decreased to 0.03 once HVR1 was removed (Fig 5, panel i, Core-E2 and Core-E2 w/o HVR1). The E1-HVR1 region demonstrated a wider distribution and the highest clustering threshold (0.06 genetic distance; Fig 6, panel i). A clustering threshold was unable to be estimated for the Core region due to the narrow low distribution of the genetic distances of this sequence, although a threshold of 0.015 was possible once concatenated to NS5B (Fig 7, panel i, NS5B and S3–S5 Figs, panel i). Concatenation of HCV regions with NS5B only moderately affected the clustering threshold in all other regions. Overall, removing HVR1 shifted the pairwise distribution curves to the left, decrease of genetic distance, as seen for example for Core-E2 and E1-HVR1 (Fig 5, panel i).


The Influence of Hepatitis C Virus Genetic Region on Phylogenetic Clustering Analysis.

Lamoury FM, Jacka B, Bartlett S, Bull RA, Wong A, Amin J, Schinkel J, Poon AF, Matthews GV, Grebely J, Dore GJ, Applegate TL - PLoS ONE (2015)

Clustering results among the 50 GT1a ATAHC sequences: genetic distance, percentage of sequences, tree, patristic distance and bootstrap values.Panel i: The genetic distance distribution is shown for both ATAHC sequences (dark colour) and Los Alamos HCV database reference sequences (clear colour). The vertical dotted lines represent the thresholds for clustering, which were estimated by determining the point of overlap/uncertainty region between the two curves of most-closely related (ATAHC sequences) and distantly related (both ATAHC and LANL sequences) for each HCV region. Panel ii shows the ATAHC clustering patterns using Cluster Picker with bootstrap support threshold fixed at 90% and maximum genetic distance threshold varied between 0.01 and 0.08 (colour lines: ATAHC sequences; grey lines: LANL reference sequences). Plain lines represent the percentage of clustered sequences; dot lines correspond to average cluster size. The vertical dot line indicates the clustering threshold (as per panel i) used to determine the percentage of clustered sequences and average cluster size (Table 1). Panel iii shows the phylloclade with participants highlighted when defined as part of a cluster with the clustering threshold (panel i) and bootstrap support above 90% criteria (Cluster Picker).
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4507989&req=5

pone.0131437.g006: Clustering results among the 50 GT1a ATAHC sequences: genetic distance, percentage of sequences, tree, patristic distance and bootstrap values.Panel i: The genetic distance distribution is shown for both ATAHC sequences (dark colour) and Los Alamos HCV database reference sequences (clear colour). The vertical dotted lines represent the thresholds for clustering, which were estimated by determining the point of overlap/uncertainty region between the two curves of most-closely related (ATAHC sequences) and distantly related (both ATAHC and LANL sequences) for each HCV region. Panel ii shows the ATAHC clustering patterns using Cluster Picker with bootstrap support threshold fixed at 90% and maximum genetic distance threshold varied between 0.01 and 0.08 (colour lines: ATAHC sequences; grey lines: LANL reference sequences). Plain lines represent the percentage of clustered sequences; dot lines correspond to average cluster size. The vertical dot line indicates the clustering threshold (as per panel i) used to determine the percentage of clustered sequences and average cluster size (Table 1). Panel iii shows the phylloclade with participants highlighted when defined as part of a cluster with the clustering threshold (panel i) and bootstrap support above 90% criteria (Cluster Picker).
Mentions: The Core-E2 region clustering threshold of 0.045 genetic distance decreased to 0.03 once HVR1 was removed (Fig 5, panel i, Core-E2 and Core-E2 w/o HVR1). The E1-HVR1 region demonstrated a wider distribution and the highest clustering threshold (0.06 genetic distance; Fig 6, panel i). A clustering threshold was unable to be estimated for the Core region due to the narrow low distribution of the genetic distances of this sequence, although a threshold of 0.015 was possible once concatenated to NS5B (Fig 7, panel i, NS5B and S3–S5 Figs, panel i). Concatenation of HCV regions with NS5B only moderately affected the clustering threshold in all other regions. Overall, removing HVR1 shifted the pairwise distribution curves to the left, decrease of genetic distance, as seen for example for Core-E2 and E1-HVR1 (Fig 5, panel i).

Bottom Line: Our results demonstrated that the genomic region of HCV analysed influenced phylogenetic tree topology and clustering results.The HCV Core region alone was not suitable for clustering analysis; NS5B concatenation, the inclusion of reference sequences and removal of HVR1 all influenced clustering outcome.The Core-E2 region, which represented the highest genetic diversity and longest sequence length in this study, provides an ideal method for clustering analysis to address a range of molecular epidemiological questions.

View Article: PubMed Central - PubMed

Affiliation: The Kirby Institute, University of New South Wales Australia, Sydney, Australia.

ABSTRACT
Sequencing is important for understanding the molecular epidemiology and viral evolution of hepatitis C virus (HCV) infection. To date, there is little standardisation among sequencing protocols, in-part due to the high genetic diversity that is observed within HCV. This study aimed to develop a novel, practical sequencing protocol that covered both conserved and variable regions of the viral genome and assess the influence of each subregion, sequence concatenation and unrelated reference sequences on phylogenetic clustering analysis. The Core to the hypervariable region 1 (HVR1) of envelope-2 (E2) and non-structural-5B (NS5B) regions of the HCV genome were amplified and sequenced from participants from the Australian Trial in Acute Hepatitis C (ATAHC), a prospective study of the natural history and treatment of recent HCV infection. Phylogenetic trees were constructed using a general time-reversible substitution model and sensitivity analyses were completed for every subregion. Pairwise distance, genetic distance and bootstrap support were computed to assess the impact of HCV region on clustering results as measured by the identification and percentage of participants falling within all clusters, cluster size, average patristic distance, and bootstrap value. The Robinson-Foulds metrics was also used to compare phylogenetic trees among the different HCV regions. Our results demonstrated that the genomic region of HCV analysed influenced phylogenetic tree topology and clustering results. The HCV Core region alone was not suitable for clustering analysis; NS5B concatenation, the inclusion of reference sequences and removal of HVR1 all influenced clustering outcome. The Core-E2 region, which represented the highest genetic diversity and longest sequence length in this study, provides an ideal method for clustering analysis to address a range of molecular epidemiological questions.

No MeSH data available.


Related in: MedlinePlus