Limits...
EFIN: predicting the functional impact of nonsynonymous single nucleotide polymorphisms in human genome.

Zeng S, Yang J, Chung BH, Lau YL, Yang W - BMC Genomics (2014)

Bottom Line: Bioinformatics analysis is essential to predict potentially causal or contributing AAS to human diseases for further analysis, as for each genome, thousands of rare or private AAS exist and only a very small number of which are related to an underlying disease.Existing algorithms in this field still have high false prediction rate and novel development is needed to take full advantage of vast amount of genomic data.This approach may help us better understand the roles of genetic variants in human disease and health.

View Article: PubMed Central - PubMed

Affiliation: Department of Paediatrics and Adolescent Medicine, LKS Faculty of Medicine, The University of Hong Kong, 5 Sassoon Road, Hong Kong, China. yangwl@hkucc.hku.hk.

ABSTRACT

Background: Predicting the functional impact of amino acid substitutions (AAS) caused by nonsynonymous single nucleotide polymorphisms (nsSNPs) is becoming increasingly important as more and more novel variants are being discovered. Bioinformatics analysis is essential to predict potentially causal or contributing AAS to human diseases for further analysis, as for each genome, thousands of rare or private AAS exist and only a very small number of which are related to an underlying disease. Existing algorithms in this field still have high false prediction rate and novel development is needed to take full advantage of vast amount of genomic data.

Results: Here we report a novel algorithm that features two innovative changes: 1. making better use of sequence conservation information by grouping the homologous protein sequences into six blocks according to evolutionary distances to human and evaluating sequence conservation in each block independently, and 2. including as many such homologous sequences as possible in analyses. Random forests are used to evaluate sequence conservation in each block and to predict potential impact of an AAS on protein function. Testing of this algorithm on a comprehensive dataset showed significant improvement on prediction accuracy upon currently widely-used programs. The algorithm and a web-based application tool implementing it, EFIN (Evaluation of Functional Impact of Nonsynonymous SNPs) were made freely available (http://paed.hku.hk/efin/) to the public.

Conclusions: Grouping homologous sequences into different blocks according to the evolutionary distance of the species to human and evaluating sequence conservation in each group independently significantly improved prediction accuracy. This approach may help us better understand the roles of genetic variants in human disease and health.

Show MeSH
Box plot of NAS differences between adjacent sequences belonging to either the same block or two adjacent blocks. NAS differences larger than 0.4 were all treated as 0.4.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4061446&req=5

Fig2: Box plot of NAS differences between adjacent sequences belonging to either the same block or two adjacent blocks. NAS differences larger than 0.4 were all treated as 0.4.

Mentions: Normalized alignment score (NAS) measures similarities between a protein sequence and the querying sequence. To validate the use of the block-wise structure introduced in this study in evaluation of conservation, we compared NAS in two situations between two adjacent sequences in a multiple sequence alignment (MSA): when two adjacent sequences belong to the same block or when they belong to two different blocks (the last sequence of a block and the first sequence in the block next to it). It can be seen that a much greater difference was observed when two adjacent sequences belong to different blocks than when they belong to the same block (FigureĀ 2), using 250 randomly selected pairs for each case. This observation provides a justification for this block-wise approach in sequence conservation analysis.Figure 2


EFIN: predicting the functional impact of nonsynonymous single nucleotide polymorphisms in human genome.

Zeng S, Yang J, Chung BH, Lau YL, Yang W - BMC Genomics (2014)

Box plot of NAS differences between adjacent sequences belonging to either the same block or two adjacent blocks. NAS differences larger than 0.4 were all treated as 0.4.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4061446&req=5

Fig2: Box plot of NAS differences between adjacent sequences belonging to either the same block or two adjacent blocks. NAS differences larger than 0.4 were all treated as 0.4.
Mentions: Normalized alignment score (NAS) measures similarities between a protein sequence and the querying sequence. To validate the use of the block-wise structure introduced in this study in evaluation of conservation, we compared NAS in two situations between two adjacent sequences in a multiple sequence alignment (MSA): when two adjacent sequences belong to the same block or when they belong to two different blocks (the last sequence of a block and the first sequence in the block next to it). It can be seen that a much greater difference was observed when two adjacent sequences belong to different blocks than when they belong to the same block (FigureĀ 2), using 250 randomly selected pairs for each case. This observation provides a justification for this block-wise approach in sequence conservation analysis.Figure 2

Bottom Line: Bioinformatics analysis is essential to predict potentially causal or contributing AAS to human diseases for further analysis, as for each genome, thousands of rare or private AAS exist and only a very small number of which are related to an underlying disease.Existing algorithms in this field still have high false prediction rate and novel development is needed to take full advantage of vast amount of genomic data.This approach may help us better understand the roles of genetic variants in human disease and health.

View Article: PubMed Central - PubMed

Affiliation: Department of Paediatrics and Adolescent Medicine, LKS Faculty of Medicine, The University of Hong Kong, 5 Sassoon Road, Hong Kong, China. yangwl@hkucc.hku.hk.

ABSTRACT

Background: Predicting the functional impact of amino acid substitutions (AAS) caused by nonsynonymous single nucleotide polymorphisms (nsSNPs) is becoming increasingly important as more and more novel variants are being discovered. Bioinformatics analysis is essential to predict potentially causal or contributing AAS to human diseases for further analysis, as for each genome, thousands of rare or private AAS exist and only a very small number of which are related to an underlying disease. Existing algorithms in this field still have high false prediction rate and novel development is needed to take full advantage of vast amount of genomic data.

Results: Here we report a novel algorithm that features two innovative changes: 1. making better use of sequence conservation information by grouping the homologous protein sequences into six blocks according to evolutionary distances to human and evaluating sequence conservation in each block independently, and 2. including as many such homologous sequences as possible in analyses. Random forests are used to evaluate sequence conservation in each block and to predict potential impact of an AAS on protein function. Testing of this algorithm on a comprehensive dataset showed significant improvement on prediction accuracy upon currently widely-used programs. The algorithm and a web-based application tool implementing it, EFIN (Evaluation of Functional Impact of Nonsynonymous SNPs) were made freely available (http://paed.hku.hk/efin/) to the public.

Conclusions: Grouping homologous sequences into different blocks according to the evolutionary distance of the species to human and evaluating sequence conservation in each group independently significantly improved prediction accuracy. This approach may help us better understand the roles of genetic variants in human disease and health.

Show MeSH