Limits...
Linear filtering reveals false negatives in species interaction data

View Article: PubMed Central - PubMed

ABSTRACT

Species interaction datasets, often represented as sparse matrices, are usually collected through observation studies targeted at identifying species interactions. Due to the extensive required sampling effort, species interaction datasets usually contain many false negatives, often leading to bias in derived descriptors. We show that a simple linear filter can be used to detect false negatives by scoring interactions based on the structure of the interaction matrices. On 180 different datasets of various sizes, sparsities and ecological interaction types, we found that on average in about 75% of the cases, a false negative interaction got a higher score than a true negative interaction. Furthermore, we show that this filter is very robust, even when the interaction matrix contains a very large number of false negatives. Our results demonstrate that unobserved interactions can be detected in species interaction datasets, even without resorting to information about the species involved.

No MeSH data available.


Results of the imputation experiments using the four datasets shown in Fig. 2.(a) ROC curves for the scores of the LOO imputation. (b) The precision of detecting true interactions as a function of the size of top-scoring interactions. In both plots full lines represent experiments where the intensity of the interactions was used and broken lines represent experiments where the interaction dataset was binarized.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC5382893&req=5

f3: Results of the imputation experiments using the four datasets shown in Fig. 2.(a) ROC curves for the scores of the LOO imputation. (b) The precision of detecting true interactions as a function of the size of top-scoring interactions. In both plots full lines represent experiments where the intensity of the interactions was used and broken lines represent experiments where the interaction dataset was binarized.

Mentions: Four sizeable datasets representing different types of interactions2526272829 were studied in more detail, see Fig. 2. In Fig. 3(a) the ROC curves illustrate that usually a large fraction of the positive interactions can easily be detected without obtaining many false positives. This is important for practical applications, as these high-scoring interactions should be used to decide which interactions are promising for validation in the field. The top-scoring interactions are strongly enriched with positives, as illustrated in Fig. 3(b), which shows the precision (fraction of top-scoring positive interactions) as a function of the size of the top. Although the individual patterns vary with the density, distribution and sampling effort of the interaction datasets, here one can observe also a clear trend that making the datasets binary results in higher precision. On average, for all datasets, the precision at the top-10 was 0.69 ± 0.27, which is substantially higher than the average density of 15%, the expected precision of a random scoring.


Linear filtering reveals false negatives in species interaction data
Results of the imputation experiments using the four datasets shown in Fig. 2.(a) ROC curves for the scores of the LOO imputation. (b) The precision of detecting true interactions as a function of the size of top-scoring interactions. In both plots full lines represent experiments where the intensity of the interactions was used and broken lines represent experiments where the interaction dataset was binarized.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC5382893&req=5

f3: Results of the imputation experiments using the four datasets shown in Fig. 2.(a) ROC curves for the scores of the LOO imputation. (b) The precision of detecting true interactions as a function of the size of top-scoring interactions. In both plots full lines represent experiments where the intensity of the interactions was used and broken lines represent experiments where the interaction dataset was binarized.
Mentions: Four sizeable datasets representing different types of interactions2526272829 were studied in more detail, see Fig. 2. In Fig. 3(a) the ROC curves illustrate that usually a large fraction of the positive interactions can easily be detected without obtaining many false positives. This is important for practical applications, as these high-scoring interactions should be used to decide which interactions are promising for validation in the field. The top-scoring interactions are strongly enriched with positives, as illustrated in Fig. 3(b), which shows the precision (fraction of top-scoring positive interactions) as a function of the size of the top. Although the individual patterns vary with the density, distribution and sampling effort of the interaction datasets, here one can observe also a clear trend that making the datasets binary results in higher precision. On average, for all datasets, the precision at the top-10 was 0.69 ± 0.27, which is substantially higher than the average density of 15%, the expected precision of a random scoring.

View Article: PubMed Central - PubMed

ABSTRACT

Species interaction datasets, often represented as sparse matrices, are usually collected through observation studies targeted at identifying species interactions. Due to the extensive required sampling effort, species interaction datasets usually contain many false negatives, often leading to bias in derived descriptors. We show that a simple linear filter can be used to detect false negatives by scoring interactions based on the structure of the interaction matrices. On 180 different datasets of various sizes, sparsities and ecological interaction types, we found that on average in about 75% of the cases, a false negative interaction got a higher score than a true negative interaction. Furthermore, we show that this filter is very robust, even when the interaction matrix contains a very large number of false negatives. Our results demonstrate that unobserved interactions can be detected in species interaction datasets, even without resorting to information about the species involved.

No MeSH data available.