Limits...
Accuracy of climate-based forecasts of pathogen spread

View Article: PubMed Central - PubMed

ABSTRACT

Species distribution models (SDMs) are a tool for predicting the eventual geographical range of an emerging pathogen. Most SDMs, however, rely on an assumption of equilibrium with the environment, which an emerging pathogen, by definition, has not reached. To determine if some SDM approaches work better than others for modelling the spread of emerging, non-equilibrium pathogens, we studied time-sensitive predictive performance of SDMs for Batrachochytrium dendrobatidis, a devastating infectious fungus of amphibians, using multiple methods trained on time-incremented subsets of the available data. We split our data into timeline-based training and testing sets, and evaluated models on each set using standard performance criteria, including AUC, kappa, false negative rate and the Boyce index. Of eight models examined, we found that boosted regression trees and random forests performed best, closely followed by MaxEnt. As expected, predictive performance generally improved with the length of time series used for model training. These results provide information on how quickly the potential extent of an emerging disease may be determined, and identify which modelling frameworks are likely to provide useful information during the early phases of pathogen expansion.

No MeSH data available.


Flowchart of data divisions. Blue boxes represent presence points; green boxes represent background points. TR stands for training data and TE for test data.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC5383844&req=5

RSOS160975F2: Flowchart of data divisions. Blue boxes represent presence points; green boxes represent background points. TR stands for training data and TE for test data.

Mentions: We divided data into training and test sets in four ways, two of which were chronological (figure 2). For these chronological divisions, we trained models on a defined time period (1980 to year i) and tested on the remainder (year i + 1 to 2011; figure 2a). First, we used the period from 1980 to 1995 (24 presence points) and proceeded in 4-year time steps, ending up with four blocks to compare (i ∈ {1995, 1999, 2003, 2007}). An alternative chronological subsetting pattern redistributed this time-frame because years 2005 and 2006 contained 165 and 150 presence points, respectively, considerably more than most other years. In the first set-up, these two years fell into the same subset, resulting in their removal from testing data and addition to training data all at once. The alternative subset process split 2005 and 2006 into separate blocks, more evenly distributing the training and test points through time. Here, the first block was 1980–1996 (60 presence points), and blocks proceeded in 3-year time steps for five total (i ∈ {1996, 1999, 2002, 2005, 2008}). In each case, 9945 background points were used for training models that required background records. These two chronological analyses were then compared.Figure 2.


Accuracy of climate-based forecasts of pathogen spread
Flowchart of data divisions. Blue boxes represent presence points; green boxes represent background points. TR stands for training data and TE for test data.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC5383844&req=5

RSOS160975F2: Flowchart of data divisions. Blue boxes represent presence points; green boxes represent background points. TR stands for training data and TE for test data.
Mentions: We divided data into training and test sets in four ways, two of which were chronological (figure 2). For these chronological divisions, we trained models on a defined time period (1980 to year i) and tested on the remainder (year i + 1 to 2011; figure 2a). First, we used the period from 1980 to 1995 (24 presence points) and proceeded in 4-year time steps, ending up with four blocks to compare (i ∈ {1995, 1999, 2003, 2007}). An alternative chronological subsetting pattern redistributed this time-frame because years 2005 and 2006 contained 165 and 150 presence points, respectively, considerably more than most other years. In the first set-up, these two years fell into the same subset, resulting in their removal from testing data and addition to training data all at once. The alternative subset process split 2005 and 2006 into separate blocks, more evenly distributing the training and test points through time. Here, the first block was 1980–1996 (60 presence points), and blocks proceeded in 3-year time steps for five total (i ∈ {1996, 1999, 2002, 2005, 2008}). In each case, 9945 background points were used for training models that required background records. These two chronological analyses were then compared.Figure 2.

View Article: PubMed Central - PubMed

ABSTRACT

Species distribution models (SDMs) are a tool for predicting the eventual geographical range of an emerging pathogen. Most SDMs, however, rely on an assumption of equilibrium with the environment, which an emerging pathogen, by definition, has not reached. To determine if some SDM approaches work better than others for modelling the spread of emerging, non-equilibrium pathogens, we studied time-sensitive predictive performance of SDMs for Batrachochytrium dendrobatidis, a devastating infectious fungus of amphibians, using multiple methods trained on time-incremented subsets of the available data. We split our data into timeline-based training and testing sets, and evaluated models on each set using standard performance criteria, including AUC, kappa, false negative rate and the Boyce index. Of eight models examined, we found that boosted regression trees and random forests performed best, closely followed by MaxEnt. As expected, predictive performance generally improved with the length of time series used for model training. These results provide information on how quickly the potential extent of an emerging disease may be determined, and identify which modelling frameworks are likely to provide useful information during the early phases of pathogen expansion.

No MeSH data available.