Limits...
Predicting host tropism of influenza A virus proteins using random forest.

Eng CL, Tong JC, Tan TW - BMC Med Genomics (2014)

Bottom Line: From the prediction models constructed, all achieved high prediction performance, indicating clear distinctions in both avian and human proteins.When used together as a host tropism prediction system, zoonotic strains could potentially be identified based on different protein prediction results.Understanding and predicting host tropism of influenza proteins lay an important foundation for future work in constructing computation models capable of directly predicting interspecies transmission of influenza viruses.

View Article: PubMed Central - HTML - PubMed

ABSTRACT

Background: Majority of influenza A viruses reside and circulate among animal populations, seldom infecting humans due to host range restriction. Yet when some avian strains do acquire the ability to overcome species barrier, they might become adapted to humans, replicating efficiently and causing diseases, leading to potential pandemic. With the huge influenza A virus reservoir in wild birds, it is a cause for concern when a new influenza strain emerges with the ability to cross host species barrier, as shown in light of the recent H7N9 outbreak in China. Several influenza proteins have been shown to be major determinants in host tropism. Further understanding and determining host tropism would be important in identifying zoonotic influenza virus strains capable of crossing species barrier and infecting humans.

Results: In this study, computational models for 11 influenza proteins have been constructed using the machine learning algorithm random forest for prediction of host tropism. The prediction models were trained on influenza protein sequences isolated from both avian and human samples, which were transformed into amino acid physicochemical properties feature vectors. The results were highly accurate prediction models (ACC>96.57; AUC>0.980; MCC>0.916) capable of determining host tropism of individual influenza proteins. In addition, features from all 11 proteins were used to construct a combined model to predict host tropism of influenza virus strains. This would help assess a novel influenza strain's host range capability.

Conclusions: From the prediction models constructed, all achieved high prediction performance, indicating clear distinctions in both avian and human proteins. When used together as a host tropism prediction system, zoonotic strains could potentially be identified based on different protein prediction results. Understanding and predicting host tropism of influenza proteins lay an important foundation for future work in constructing computation models capable of directly predicting interspecies transmission of influenza viruses. The models are available for prediction at http://fluleap.bic.nus.edu.sg.

Show MeSH

Related in: MedlinePlus

Host tropism prediction results for sample strains. The results for four avian strains are shown at the top while the bottom half shows results for four human strains. The prediction results were strung together illustrating an entire influenza A genome with eight segments encoding 11 proteins. The proteins coded by the segment are listed at the bottom of the figure. Each protein prediction is independent and is not influenced by prediction of other proteins. Blue bars represent a prediction of avian by the corresponding protein prediction model while red bars represent a prediction result of human. Grey bars indicate that prediction was not made as the corresponding protein sequence was not available or incomplete. Accurate predictions were made for all 11 proteins for the first two avian strains as well as the final two human strains. However, prediction results for the remaining four strains from the 1997 H5N1 outbreak in Hong Kong and the 2013 H7N9 outbreak in China show mixed predictions of avian and human proteins. The human strains isolated during the two outbreaks showing some of its proteins predicted as avian indicate the source of infection as most likely avian. On the other hand, the avian strains from chickens during the two outbreaks have several proteins that were predicted human and suggest that these proteins could have adapted to human host.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4290784&req=5

Figure 1: Host tropism prediction results for sample strains. The results for four avian strains are shown at the top while the bottom half shows results for four human strains. The prediction results were strung together illustrating an entire influenza A genome with eight segments encoding 11 proteins. The proteins coded by the segment are listed at the bottom of the figure. Each protein prediction is independent and is not influenced by prediction of other proteins. Blue bars represent a prediction of avian by the corresponding protein prediction model while red bars represent a prediction result of human. Grey bars indicate that prediction was not made as the corresponding protein sequence was not available or incomplete. Accurate predictions were made for all 11 proteins for the first two avian strains as well as the final two human strains. However, prediction results for the remaining four strains from the 1997 H5N1 outbreak in Hong Kong and the 2013 H7N9 outbreak in China show mixed predictions of avian and human proteins. The human strains isolated during the two outbreaks showing some of its proteins predicted as avian indicate the source of infection as most likely avian. On the other hand, the avian strains from chickens during the two outbreaks have several proteins that were predicted human and suggest that these proteins could have adapted to human host.

Mentions: The host tropism prediction results for all eight strains are illustrated in Figure 1. Predictions for two human strains, A/New York/231/2003 and A/Guangdong/ST798/2008 were made accurately for all 11 proteins. These two strains are of influenza subtypes common to human, periodically circulating worldwide and infecting humans during annual flu season [4]. Hence, all of their proteins have adapted well in humans and were correctly predicted by the system. Likewise, accurate predictions for all 11 proteins were also made for two avian strains, A/turkey/England/50-92/1991 and A/wild duck/Korea/SH19-50/2010. The two avian strains of subtypes H5N1 and H7N9 were isolated from turkey and wild duck before the occurrence of these subtypes in humans. All 11 of their proteins were clearly avian proteins which again, were correctly predicted by the system. Prediction results for these two avian and human strains demonstrate the high accuracy of the host tropism prediction system, where despite making each protein prediction independently and not being influenced by other predictions, it is able to classify all proteins in each strain correctly.


Predicting host tropism of influenza A virus proteins using random forest.

Eng CL, Tong JC, Tan TW - BMC Med Genomics (2014)

Host tropism prediction results for sample strains. The results for four avian strains are shown at the top while the bottom half shows results for four human strains. The prediction results were strung together illustrating an entire influenza A genome with eight segments encoding 11 proteins. The proteins coded by the segment are listed at the bottom of the figure. Each protein prediction is independent and is not influenced by prediction of other proteins. Blue bars represent a prediction of avian by the corresponding protein prediction model while red bars represent a prediction result of human. Grey bars indicate that prediction was not made as the corresponding protein sequence was not available or incomplete. Accurate predictions were made for all 11 proteins for the first two avian strains as well as the final two human strains. However, prediction results for the remaining four strains from the 1997 H5N1 outbreak in Hong Kong and the 2013 H7N9 outbreak in China show mixed predictions of avian and human proteins. The human strains isolated during the two outbreaks showing some of its proteins predicted as avian indicate the source of infection as most likely avian. On the other hand, the avian strains from chickens during the two outbreaks have several proteins that were predicted human and suggest that these proteins could have adapted to human host.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4290784&req=5

Figure 1: Host tropism prediction results for sample strains. The results for four avian strains are shown at the top while the bottom half shows results for four human strains. The prediction results were strung together illustrating an entire influenza A genome with eight segments encoding 11 proteins. The proteins coded by the segment are listed at the bottom of the figure. Each protein prediction is independent and is not influenced by prediction of other proteins. Blue bars represent a prediction of avian by the corresponding protein prediction model while red bars represent a prediction result of human. Grey bars indicate that prediction was not made as the corresponding protein sequence was not available or incomplete. Accurate predictions were made for all 11 proteins for the first two avian strains as well as the final two human strains. However, prediction results for the remaining four strains from the 1997 H5N1 outbreak in Hong Kong and the 2013 H7N9 outbreak in China show mixed predictions of avian and human proteins. The human strains isolated during the two outbreaks showing some of its proteins predicted as avian indicate the source of infection as most likely avian. On the other hand, the avian strains from chickens during the two outbreaks have several proteins that were predicted human and suggest that these proteins could have adapted to human host.
Mentions: The host tropism prediction results for all eight strains are illustrated in Figure 1. Predictions for two human strains, A/New York/231/2003 and A/Guangdong/ST798/2008 were made accurately for all 11 proteins. These two strains are of influenza subtypes common to human, periodically circulating worldwide and infecting humans during annual flu season [4]. Hence, all of their proteins have adapted well in humans and were correctly predicted by the system. Likewise, accurate predictions for all 11 proteins were also made for two avian strains, A/turkey/England/50-92/1991 and A/wild duck/Korea/SH19-50/2010. The two avian strains of subtypes H5N1 and H7N9 were isolated from turkey and wild duck before the occurrence of these subtypes in humans. All 11 of their proteins were clearly avian proteins which again, were correctly predicted by the system. Prediction results for these two avian and human strains demonstrate the high accuracy of the host tropism prediction system, where despite making each protein prediction independently and not being influenced by other predictions, it is able to classify all proteins in each strain correctly.

Bottom Line: From the prediction models constructed, all achieved high prediction performance, indicating clear distinctions in both avian and human proteins.When used together as a host tropism prediction system, zoonotic strains could potentially be identified based on different protein prediction results.Understanding and predicting host tropism of influenza proteins lay an important foundation for future work in constructing computation models capable of directly predicting interspecies transmission of influenza viruses.

View Article: PubMed Central - HTML - PubMed

ABSTRACT

Background: Majority of influenza A viruses reside and circulate among animal populations, seldom infecting humans due to host range restriction. Yet when some avian strains do acquire the ability to overcome species barrier, they might become adapted to humans, replicating efficiently and causing diseases, leading to potential pandemic. With the huge influenza A virus reservoir in wild birds, it is a cause for concern when a new influenza strain emerges with the ability to cross host species barrier, as shown in light of the recent H7N9 outbreak in China. Several influenza proteins have been shown to be major determinants in host tropism. Further understanding and determining host tropism would be important in identifying zoonotic influenza virus strains capable of crossing species barrier and infecting humans.

Results: In this study, computational models for 11 influenza proteins have been constructed using the machine learning algorithm random forest for prediction of host tropism. The prediction models were trained on influenza protein sequences isolated from both avian and human samples, which were transformed into amino acid physicochemical properties feature vectors. The results were highly accurate prediction models (ACC>96.57; AUC>0.980; MCC>0.916) capable of determining host tropism of individual influenza proteins. In addition, features from all 11 proteins were used to construct a combined model to predict host tropism of influenza virus strains. This would help assess a novel influenza strain's host range capability.

Conclusions: From the prediction models constructed, all achieved high prediction performance, indicating clear distinctions in both avian and human proteins. When used together as a host tropism prediction system, zoonotic strains could potentially be identified based on different protein prediction results. Understanding and predicting host tropism of influenza proteins lay an important foundation for future work in constructing computation models capable of directly predicting interspecies transmission of influenza viruses. The models are available for prediction at http://fluleap.bic.nus.edu.sg.

Show MeSH
Related in: MedlinePlus