Limits...
Filling in the GAPS : evaluating completeness and coverage of open ‐ access biodiversity databases in the United States

View Article: PubMed Central - PubMed

ABSTRACT

Primary biodiversity data constitute observations of particular species at given points in time and space. Open‐access electronic databases provide unprecedented access to these data, but their usefulness in characterizing species distributions and patterns in biodiversity depend on how complete species inventories are at a given survey location and how uniformly distributed survey locations are along dimensions of time, space, and environment. Our aim was to compare completeness and coverage among three open‐access databases representing ten taxonomic groups (amphibians, birds, freshwater bivalves, crayfish, freshwater fish, fungi, insects, mammals, plants, and reptiles) in the contiguous United States. We compiled occurrence records from the Global Biodiversity Information Facility (GBIF), the North American Breeding Bird Survey (BBS), and federally administered fish surveys (FFS). We aggregated occurrence records by 0.1° × 0.1° grid cells and computed three completeness metrics to classify each grid cell as well‐surveyed or not. Next, we compared frequency distributions of surveyed grid cells to background environmental conditions in a GIS and performed Kolmogorov–Smirnov tests to quantify coverage through time, along two spatial gradients, and along eight environmental gradients. The three databases contributed >13.6 million reliable occurrence records distributed among >190,000 grid cells. The percent of well‐surveyed grid cells was substantially lower for GBIF (5.2%) than for systematic surveys (BBS and FFS; 82.5%). Still, the large number of GBIF occurrence records produced at least 250 well‐surveyed grid cells for six of nine taxonomic groups. Coverages of systematic surveys were less biased across spatial and environmental dimensions but were more biased in temporal coverage compared to GBIF data. GBIF coverages also varied among taxonomic groups, consistent with commonly recognized geographic, environmental, and institutional sampling biases. This comprehensive assessment of biodiversity data across the contiguous United States provides a prioritization scheme to fill in the gaps by contributing existing occurrence records to the public domain and planning future surveys.

No MeSH data available.


Relationship between (A) number of species per grid cell and (B) cumulative coverage of well‐surveyed grid cells versus all surveyed grid cells derived from three open‐access biodiversity databases representing ten taxonomic groups. In (B), low values represent unbiased coverage and high values represent biased coverage relative to the background environment. Note that richness and cumulative coverage could not be plotted along the y‐axis for crayfish because no well‐surveyed grid cells were identified.
© Copyright Policy - creativeCommonsBy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4979697&req=5

ece32225-fig-0004: Relationship between (A) number of species per grid cell and (B) cumulative coverage of well‐surveyed grid cells versus all surveyed grid cells derived from three open‐access biodiversity databases representing ten taxonomic groups. In (B), low values represent unbiased coverage and high values represent biased coverage relative to the background environment. Note that richness and cumulative coverage could not be plotted along the y‐axis for crayfish because no well‐surveyed grid cells were identified.

Mentions: GBIF surveys represented the longest period of record (dating back to 1800), followed by the BBS surveys (1967) and FFS surveys (1990) (Fig. 3). GBIF surveys, particularly those classified as well‐surveyed, were most prevalent since approximately 1920. Nevertheless, a substantial number of well‐surveyed grid cells were available from the nineteenth century for birds (116 grid cells), mammals (22 grid cells), and plants (9 grid cells). The average number of species inventoried per grid cell (i.e., survey richness) was highest for birds, plants, and fungi and lowest for crayfish, amphibians, and mammals. For most taxa, well‐surveyed grid cells contained more species (upper left diagonal in Fig. 4A) than did all (i.e., both well‐surveyed and not‐well‐surveyed) surveyed grid cells. Two exceptions were BBS birds and FFS fish, both of which contained similar numbers of species for well‐surveyed grid cells and all grid cells (1:1 line in Fig. 4A).


Filling in the GAPS : evaluating completeness and coverage of open ‐ access biodiversity databases in the United States
Relationship between (A) number of species per grid cell and (B) cumulative coverage of well‐surveyed grid cells versus all surveyed grid cells derived from three open‐access biodiversity databases representing ten taxonomic groups. In (B), low values represent unbiased coverage and high values represent biased coverage relative to the background environment. Note that richness and cumulative coverage could not be plotted along the y‐axis for crayfish because no well‐surveyed grid cells were identified.
© Copyright Policy - creativeCommonsBy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4979697&req=5

ece32225-fig-0004: Relationship between (A) number of species per grid cell and (B) cumulative coverage of well‐surveyed grid cells versus all surveyed grid cells derived from three open‐access biodiversity databases representing ten taxonomic groups. In (B), low values represent unbiased coverage and high values represent biased coverage relative to the background environment. Note that richness and cumulative coverage could not be plotted along the y‐axis for crayfish because no well‐surveyed grid cells were identified.
Mentions: GBIF surveys represented the longest period of record (dating back to 1800), followed by the BBS surveys (1967) and FFS surveys (1990) (Fig. 3). GBIF surveys, particularly those classified as well‐surveyed, were most prevalent since approximately 1920. Nevertheless, a substantial number of well‐surveyed grid cells were available from the nineteenth century for birds (116 grid cells), mammals (22 grid cells), and plants (9 grid cells). The average number of species inventoried per grid cell (i.e., survey richness) was highest for birds, plants, and fungi and lowest for crayfish, amphibians, and mammals. For most taxa, well‐surveyed grid cells contained more species (upper left diagonal in Fig. 4A) than did all (i.e., both well‐surveyed and not‐well‐surveyed) surveyed grid cells. Two exceptions were BBS birds and FFS fish, both of which contained similar numbers of species for well‐surveyed grid cells and all grid cells (1:1 line in Fig. 4A).

View Article: PubMed Central - PubMed

ABSTRACT

Primary biodiversity data constitute observations of particular species at given points in time and space. Open‐access electronic databases provide unprecedented access to these data, but their usefulness in characterizing species distributions and patterns in biodiversity depend on how complete species inventories are at a given survey location and how uniformly distributed survey locations are along dimensions of time, space, and environment. Our aim was to compare completeness and coverage among three open‐access databases representing ten taxonomic groups (amphibians, birds, freshwater bivalves, crayfish, freshwater fish, fungi, insects, mammals, plants, and reptiles) in the contiguous United States. We compiled occurrence records from the Global Biodiversity Information Facility (GBIF), the North American Breeding Bird Survey (BBS), and federally administered fish surveys (FFS). We aggregated occurrence records by 0.1° × 0.1° grid cells and computed three completeness metrics to classify each grid cell as well‐surveyed or not. Next, we compared frequency distributions of surveyed grid cells to background environmental conditions in a GIS and performed Kolmogorov–Smirnov tests to quantify coverage through time, along two spatial gradients, and along eight environmental gradients. The three databases contributed >13.6 million reliable occurrence records distributed among >190,000 grid cells. The percent of well‐surveyed grid cells was substantially lower for GBIF (5.2%) than for systematic surveys (BBS and FFS; 82.5%). Still, the large number of GBIF occurrence records produced at least 250 well‐surveyed grid cells for six of nine taxonomic groups. Coverages of systematic surveys were less biased across spatial and environmental dimensions but were more biased in temporal coverage compared to GBIF data. GBIF coverages also varied among taxonomic groups, consistent with commonly recognized geographic, environmental, and institutional sampling biases. This comprehensive assessment of biodiversity data across the contiguous United States provides a prioritization scheme to fill in the gaps by contributing existing occurrence records to the public domain and planning future surveys.

No MeSH data available.