Limits...
cFinder: definition and quantification of multiple haplotypes in a mixed sample.

Niklas N, Hafenscher J, Barna A, Wiesinger K, Pröll J, Dreiseitl S, Preuner-Stix S, Valent P, Lion T, Gabriel C - BMC Res Notes (2015)

Bottom Line: BCR-ABL1 samples containing multiple clones were used for testing and our cFinder could identify all previously found clones together with their abundance and even refine some results.As a result the cFinder reports the connections of variants (haplotypes) with their readcount and relative occurrence (percentage).To our knowledge, this is the first software that enables researchers without extensive bioinformatic support to designate multiple haplotypes and how they constitute to a sample.

View Article: PubMed Central - PubMed

Affiliation: Red Cross Transfusion Service for Upper Austria, Krankenhausstraße 7, 4017, Linz, Austria. norbert.niklas@o.roteskreuz.at.

ABSTRACT

Background: Next-generation sequencing allows for determining the genetic composition of a mixed sample. For instance, when performing resistance testing for BCR-ABL1 it is necessary to identify clones and define compound mutations; together with an exact quantification this may complement diagnosis and therapy decisions with additional information. Moreover, that applies not only to oncological issues but also determination of viral, bacterial or fungal infection. The efforts to retrieve multiple haplotypes (more than two) and proportion information from data with conventional software are difficult, cumbersome and demand multiple manual steps.

Results: Therefore, we developed a tool called cFinder that is capable of automatic detection of haplotypes and their accurate quantification within one sample. BCR-ABL1 samples containing multiple clones were used for testing and our cFinder could identify all previously found clones together with their abundance and even refine some results. Additionally, reads were simulated using GemSIM with multiple haplotypes, the detection was very close to linear (R(2) = 0.96). Our aim is not to deduce haploblocks over statistics, but to characterize one sample's composition precisely. As a result the cFinder reports the connections of variants (haplotypes) with their readcount and relative occurrence (percentage). Download is available at http://sourceforge.net/projects/cfinder/.

Conclusions: Our cFinder is implemented in an efficient algorithm that can be run on a low-performance desktop computer. Furthermore, it considers paired-end information (if available) and is generally open for any current next-generation sequencing technology and alignment strategy. To our knowledge, this is the first software that enables researchers without extensive bioinformatic support to designate multiple haplotypes and how they constitute to a sample.

No MeSH data available.


Related in: MedlinePlus

Overlapping multi-amplicon paired-end design. Schematic representation of covering a region larger than the maximum reading length, haplotypes are therefore scattered in multiple sequence reads. Reference sequence is displayed in black, amplicons in forward (green) and reverse (red), pairs (but not covered regions) are denoted with a dashed line
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4562109&req=5

Fig2: Overlapping multi-amplicon paired-end design. Schematic representation of covering a region larger than the maximum reading length, haplotypes are therefore scattered in multiple sequence reads. Reference sequence is displayed in black, amplicons in forward (green) and reverse (red), pairs (but not covered regions) are denoted with a dashed line

Mentions: Specialized amplicon designs have to be developed when read length is shorter than region of interest (see Fig. 2). If this is not uniformly covered by design, the amplicon positions must be loaded (to perform correct calculations). For these amplicon designs the user finds a specialized checkbox “Infer Relationship(s)” where haplotypes scattered over multiple amplicons are again reconnected. The algorithm creates a symmetric, unweighted graph with nodes representing variants and edges representing an occurrence on (at least) one sequence. In this graph, the largest subgraph is determined where all nodes are connected with all other nodes in the subgraph (finding this graph is referred to as maximum clique problem in graph theory). The subgraph defines a new haplotype, and readcounts for contributing fragments are recalculated. Fragments and connections with no reads left are removed and the next largest subgraph is searched. Since the maximum clique problem is NP-complete [9], this is the only computationally expensive task after loading. It is limited to 15 rounds and 20 variants to be accomplishable on a desktop computer. The output for an overlapping amplicon design does not differ from the above described output. Concerning the connections between variants, some might be detected that cannot be found on reads. It should be noticed that this combinatorial task is complex and sophisticated and there are cases where no connection can be achieved at all, especially with high number of variants and overlapping subgraphs.Fig. 2


cFinder: definition and quantification of multiple haplotypes in a mixed sample.

Niklas N, Hafenscher J, Barna A, Wiesinger K, Pröll J, Dreiseitl S, Preuner-Stix S, Valent P, Lion T, Gabriel C - BMC Res Notes (2015)

Overlapping multi-amplicon paired-end design. Schematic representation of covering a region larger than the maximum reading length, haplotypes are therefore scattered in multiple sequence reads. Reference sequence is displayed in black, amplicons in forward (green) and reverse (red), pairs (but not covered regions) are denoted with a dashed line
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4562109&req=5

Fig2: Overlapping multi-amplicon paired-end design. Schematic representation of covering a region larger than the maximum reading length, haplotypes are therefore scattered in multiple sequence reads. Reference sequence is displayed in black, amplicons in forward (green) and reverse (red), pairs (but not covered regions) are denoted with a dashed line
Mentions: Specialized amplicon designs have to be developed when read length is shorter than region of interest (see Fig. 2). If this is not uniformly covered by design, the amplicon positions must be loaded (to perform correct calculations). For these amplicon designs the user finds a specialized checkbox “Infer Relationship(s)” where haplotypes scattered over multiple amplicons are again reconnected. The algorithm creates a symmetric, unweighted graph with nodes representing variants and edges representing an occurrence on (at least) one sequence. In this graph, the largest subgraph is determined where all nodes are connected with all other nodes in the subgraph (finding this graph is referred to as maximum clique problem in graph theory). The subgraph defines a new haplotype, and readcounts for contributing fragments are recalculated. Fragments and connections with no reads left are removed and the next largest subgraph is searched. Since the maximum clique problem is NP-complete [9], this is the only computationally expensive task after loading. It is limited to 15 rounds and 20 variants to be accomplishable on a desktop computer. The output for an overlapping amplicon design does not differ from the above described output. Concerning the connections between variants, some might be detected that cannot be found on reads. It should be noticed that this combinatorial task is complex and sophisticated and there are cases where no connection can be achieved at all, especially with high number of variants and overlapping subgraphs.Fig. 2

Bottom Line: BCR-ABL1 samples containing multiple clones were used for testing and our cFinder could identify all previously found clones together with their abundance and even refine some results.As a result the cFinder reports the connections of variants (haplotypes) with their readcount and relative occurrence (percentage).To our knowledge, this is the first software that enables researchers without extensive bioinformatic support to designate multiple haplotypes and how they constitute to a sample.

View Article: PubMed Central - PubMed

Affiliation: Red Cross Transfusion Service for Upper Austria, Krankenhausstraße 7, 4017, Linz, Austria. norbert.niklas@o.roteskreuz.at.

ABSTRACT

Background: Next-generation sequencing allows for determining the genetic composition of a mixed sample. For instance, when performing resistance testing for BCR-ABL1 it is necessary to identify clones and define compound mutations; together with an exact quantification this may complement diagnosis and therapy decisions with additional information. Moreover, that applies not only to oncological issues but also determination of viral, bacterial or fungal infection. The efforts to retrieve multiple haplotypes (more than two) and proportion information from data with conventional software are difficult, cumbersome and demand multiple manual steps.

Results: Therefore, we developed a tool called cFinder that is capable of automatic detection of haplotypes and their accurate quantification within one sample. BCR-ABL1 samples containing multiple clones were used for testing and our cFinder could identify all previously found clones together with their abundance and even refine some results. Additionally, reads were simulated using GemSIM with multiple haplotypes, the detection was very close to linear (R(2) = 0.96). Our aim is not to deduce haploblocks over statistics, but to characterize one sample's composition precisely. As a result the cFinder reports the connections of variants (haplotypes) with their readcount and relative occurrence (percentage). Download is available at http://sourceforge.net/projects/cfinder/.

Conclusions: Our cFinder is implemented in an efficient algorithm that can be run on a low-performance desktop computer. Furthermore, it considers paired-end information (if available) and is generally open for any current next-generation sequencing technology and alignment strategy. To our knowledge, this is the first software that enables researchers without extensive bioinformatic support to designate multiple haplotypes and how they constitute to a sample.

No MeSH data available.


Related in: MedlinePlus