Limits...
The Influence of Hepatitis C Virus Genetic Region on Phylogenetic Clustering Analysis.

Lamoury FM, Jacka B, Bartlett S, Bull RA, Wong A, Amin J, Schinkel J, Poon AF, Matthews GV, Grebely J, Dore GJ, Applegate TL - PLoS ONE (2015)

Bottom Line: Our results demonstrated that the genomic region of HCV analysed influenced phylogenetic tree topology and clustering results.The HCV Core region alone was not suitable for clustering analysis; NS5B concatenation, the inclusion of reference sequences and removal of HVR1 all influenced clustering outcome.The Core-E2 region, which represented the highest genetic diversity and longest sequence length in this study, provides an ideal method for clustering analysis to address a range of molecular epidemiological questions.

View Article: PubMed Central - PubMed

Affiliation: The Kirby Institute, University of New South Wales Australia, Sydney, Australia.

ABSTRACT
Sequencing is important for understanding the molecular epidemiology and viral evolution of hepatitis C virus (HCV) infection. To date, there is little standardisation among sequencing protocols, in-part due to the high genetic diversity that is observed within HCV. This study aimed to develop a novel, practical sequencing protocol that covered both conserved and variable regions of the viral genome and assess the influence of each subregion, sequence concatenation and unrelated reference sequences on phylogenetic clustering analysis. The Core to the hypervariable region 1 (HVR1) of envelope-2 (E2) and non-structural-5B (NS5B) regions of the HCV genome were amplified and sequenced from participants from the Australian Trial in Acute Hepatitis C (ATAHC), a prospective study of the natural history and treatment of recent HCV infection. Phylogenetic trees were constructed using a general time-reversible substitution model and sensitivity analyses were completed for every subregion. Pairwise distance, genetic distance and bootstrap support were computed to assess the impact of HCV region on clustering results as measured by the identification and percentage of participants falling within all clusters, cluster size, average patristic distance, and bootstrap value. The Robinson-Foulds metrics was also used to compare phylogenetic trees among the different HCV regions. Our results demonstrated that the genomic region of HCV analysed influenced phylogenetic tree topology and clustering results. The HCV Core region alone was not suitable for clustering analysis; NS5B concatenation, the inclusion of reference sequences and removal of HVR1 all influenced clustering outcome. The Core-E2 region, which represented the highest genetic diversity and longest sequence length in this study, provides an ideal method for clustering analysis to address a range of molecular epidemiological questions.

No MeSH data available.


Related in: MedlinePlus

Clustering results among the 50 GT1a ATAHC sequences: genetic distance, percentage of sequences, tree, patristic distance and bootstrap values.Panel i: The genetic distance distribution is shown for both ATAHC sequences (dark colour) and Los Alamos HCV database reference sequences (clear colour). The vertical dotted lines represent the thresholds for clustering, which were estimated by determining the point of overlap/uncertainty region between the two curves of most-closely related (ATAHC sequences) and distantly related (both ATAHC and LANL sequences) for each HCV region. Panel ii shows the ATAHC clustering patterns using Cluster Picker with bootstrap support threshold fixed at 90% and maximum genetic distance threshold varied between 0.01 and 0.08 (colour lines: ATAHC sequences; grey lines: LANL reference sequences). Plain lines represent the percentage of clustered sequences; dot lines correspond to average cluster size. The vertical dot line indicates the clustering threshold (as per panel i) used to determine the percentage of clustered sequences and average cluster size (Table 1). Panel iii shows the phylloclade with participants highlighted when defined as part of a cluster with the clustering threshold (panel i) and bootstrap support above 90% criteria (Cluster Picker).
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4507989&req=5

pone.0131437.g007: Clustering results among the 50 GT1a ATAHC sequences: genetic distance, percentage of sequences, tree, patristic distance and bootstrap values.Panel i: The genetic distance distribution is shown for both ATAHC sequences (dark colour) and Los Alamos HCV database reference sequences (clear colour). The vertical dotted lines represent the thresholds for clustering, which were estimated by determining the point of overlap/uncertainty region between the two curves of most-closely related (ATAHC sequences) and distantly related (both ATAHC and LANL sequences) for each HCV region. Panel ii shows the ATAHC clustering patterns using Cluster Picker with bootstrap support threshold fixed at 90% and maximum genetic distance threshold varied between 0.01 and 0.08 (colour lines: ATAHC sequences; grey lines: LANL reference sequences). Plain lines represent the percentage of clustered sequences; dot lines correspond to average cluster size. The vertical dot line indicates the clustering threshold (as per panel i) used to determine the percentage of clustered sequences and average cluster size (Table 1). Panel iii shows the phylloclade with participants highlighted when defined as part of a cluster with the clustering threshold (panel i) and bootstrap support above 90% criteria (Cluster Picker).

Mentions: The distribution of genetic distance for ATAHC and LANL sequences for each region were determined (Figs 5–7, panel i; see also S3–S5 Figs, panel i). A threshold for clustering (represented by vertical dotted line in panel i and ii in Figs 5–7) was estimated from this distribution for ten out of the eleven regions by differentiating most-closely and distantly related ATAHC sequences, as we assumed these two sequence groups are distinctively visualised and the threshold identified by the point of overlap/uncertainty region between the two curves. Pairwise distance distribution of the LANL sequences was similar to distantly related ATAHC sequences distribution (Figs 5–7, panel i).


The Influence of Hepatitis C Virus Genetic Region on Phylogenetic Clustering Analysis.

Lamoury FM, Jacka B, Bartlett S, Bull RA, Wong A, Amin J, Schinkel J, Poon AF, Matthews GV, Grebely J, Dore GJ, Applegate TL - PLoS ONE (2015)

Clustering results among the 50 GT1a ATAHC sequences: genetic distance, percentage of sequences, tree, patristic distance and bootstrap values.Panel i: The genetic distance distribution is shown for both ATAHC sequences (dark colour) and Los Alamos HCV database reference sequences (clear colour). The vertical dotted lines represent the thresholds for clustering, which were estimated by determining the point of overlap/uncertainty region between the two curves of most-closely related (ATAHC sequences) and distantly related (both ATAHC and LANL sequences) for each HCV region. Panel ii shows the ATAHC clustering patterns using Cluster Picker with bootstrap support threshold fixed at 90% and maximum genetic distance threshold varied between 0.01 and 0.08 (colour lines: ATAHC sequences; grey lines: LANL reference sequences). Plain lines represent the percentage of clustered sequences; dot lines correspond to average cluster size. The vertical dot line indicates the clustering threshold (as per panel i) used to determine the percentage of clustered sequences and average cluster size (Table 1). Panel iii shows the phylloclade with participants highlighted when defined as part of a cluster with the clustering threshold (panel i) and bootstrap support above 90% criteria (Cluster Picker).
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4507989&req=5

pone.0131437.g007: Clustering results among the 50 GT1a ATAHC sequences: genetic distance, percentage of sequences, tree, patristic distance and bootstrap values.Panel i: The genetic distance distribution is shown for both ATAHC sequences (dark colour) and Los Alamos HCV database reference sequences (clear colour). The vertical dotted lines represent the thresholds for clustering, which were estimated by determining the point of overlap/uncertainty region between the two curves of most-closely related (ATAHC sequences) and distantly related (both ATAHC and LANL sequences) for each HCV region. Panel ii shows the ATAHC clustering patterns using Cluster Picker with bootstrap support threshold fixed at 90% and maximum genetic distance threshold varied between 0.01 and 0.08 (colour lines: ATAHC sequences; grey lines: LANL reference sequences). Plain lines represent the percentage of clustered sequences; dot lines correspond to average cluster size. The vertical dot line indicates the clustering threshold (as per panel i) used to determine the percentage of clustered sequences and average cluster size (Table 1). Panel iii shows the phylloclade with participants highlighted when defined as part of a cluster with the clustering threshold (panel i) and bootstrap support above 90% criteria (Cluster Picker).
Mentions: The distribution of genetic distance for ATAHC and LANL sequences for each region were determined (Figs 5–7, panel i; see also S3–S5 Figs, panel i). A threshold for clustering (represented by vertical dotted line in panel i and ii in Figs 5–7) was estimated from this distribution for ten out of the eleven regions by differentiating most-closely and distantly related ATAHC sequences, as we assumed these two sequence groups are distinctively visualised and the threshold identified by the point of overlap/uncertainty region between the two curves. Pairwise distance distribution of the LANL sequences was similar to distantly related ATAHC sequences distribution (Figs 5–7, panel i).

Bottom Line: Our results demonstrated that the genomic region of HCV analysed influenced phylogenetic tree topology and clustering results.The HCV Core region alone was not suitable for clustering analysis; NS5B concatenation, the inclusion of reference sequences and removal of HVR1 all influenced clustering outcome.The Core-E2 region, which represented the highest genetic diversity and longest sequence length in this study, provides an ideal method for clustering analysis to address a range of molecular epidemiological questions.

View Article: PubMed Central - PubMed

Affiliation: The Kirby Institute, University of New South Wales Australia, Sydney, Australia.

ABSTRACT
Sequencing is important for understanding the molecular epidemiology and viral evolution of hepatitis C virus (HCV) infection. To date, there is little standardisation among sequencing protocols, in-part due to the high genetic diversity that is observed within HCV. This study aimed to develop a novel, practical sequencing protocol that covered both conserved and variable regions of the viral genome and assess the influence of each subregion, sequence concatenation and unrelated reference sequences on phylogenetic clustering analysis. The Core to the hypervariable region 1 (HVR1) of envelope-2 (E2) and non-structural-5B (NS5B) regions of the HCV genome were amplified and sequenced from participants from the Australian Trial in Acute Hepatitis C (ATAHC), a prospective study of the natural history and treatment of recent HCV infection. Phylogenetic trees were constructed using a general time-reversible substitution model and sensitivity analyses were completed for every subregion. Pairwise distance, genetic distance and bootstrap support were computed to assess the impact of HCV region on clustering results as measured by the identification and percentage of participants falling within all clusters, cluster size, average patristic distance, and bootstrap value. The Robinson-Foulds metrics was also used to compare phylogenetic trees among the different HCV regions. Our results demonstrated that the genomic region of HCV analysed influenced phylogenetic tree topology and clustering results. The HCV Core region alone was not suitable for clustering analysis; NS5B concatenation, the inclusion of reference sequences and removal of HVR1 all influenced clustering outcome. The Core-E2 region, which represented the highest genetic diversity and longest sequence length in this study, provides an ideal method for clustering analysis to address a range of molecular epidemiological questions.

No MeSH data available.


Related in: MedlinePlus