Limits...
Filling in the GAPS : evaluating completeness and coverage of open ‐ access biodiversity databases in the United States

View Article: PubMed Central - PubMed

ABSTRACT

Primary biodiversity data constitute observations of particular species at given points in time and space. Open‐access electronic databases provide unprecedented access to these data, but their usefulness in characterizing species distributions and patterns in biodiversity depend on how complete species inventories are at a given survey location and how uniformly distributed survey locations are along dimensions of time, space, and environment. Our aim was to compare completeness and coverage among three open‐access databases representing ten taxonomic groups (amphibians, birds, freshwater bivalves, crayfish, freshwater fish, fungi, insects, mammals, plants, and reptiles) in the contiguous United States. We compiled occurrence records from the Global Biodiversity Information Facility (GBIF), the North American Breeding Bird Survey (BBS), and federally administered fish surveys (FFS). We aggregated occurrence records by 0.1° × 0.1° grid cells and computed three completeness metrics to classify each grid cell as well‐surveyed or not. Next, we compared frequency distributions of surveyed grid cells to background environmental conditions in a GIS and performed Kolmogorov–Smirnov tests to quantify coverage through time, along two spatial gradients, and along eight environmental gradients. The three databases contributed >13.6 million reliable occurrence records distributed among >190,000 grid cells. The percent of well‐surveyed grid cells was substantially lower for GBIF (5.2%) than for systematic surveys (BBS and FFS; 82.5%). Still, the large number of GBIF occurrence records produced at least 250 well‐surveyed grid cells for six of nine taxonomic groups. Coverages of systematic surveys were less biased across spatial and environmental dimensions but were more biased in temporal coverage compared to GBIF data. GBIF coverages also varied among taxonomic groups, consistent with commonly recognized geographic, environmental, and institutional sampling biases. This comprehensive assessment of biodiversity data across the contiguous United States provides a prioritization scheme to fill in the gaps by contributing existing occurrence records to the public domain and planning future surveys.

No MeSH data available.


Distribution of all surveyed grid cells and well‐surveyed grid cells throughout the contiguous United States during the contemporary time period (1990–2013) derived from three open‐access biodiversity databases representing ten taxonomic groups. Note that the square symbols are enlarged (i.e., larger than actual grid cell area) to facilitate visualization of well‐surveyed regions.
© Copyright Policy - creativeCommonsBy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4979697&req=5

ece32225-fig-0001: Distribution of all surveyed grid cells and well‐surveyed grid cells throughout the contiguous United States during the contemporary time period (1990–2013) derived from three open‐access biodiversity databases representing ten taxonomic groups. Note that the square symbols are enlarged (i.e., larger than actual grid cell area) to facilitate visualization of well‐surveyed regions.

Mentions: Our compilation of open‐access biodiversity data within the contiguous United States yielded in excess of 6.7 million GBIF records collected between 1800 and 2013, 4.8 million BBS records collected between 1963 and 2013, and 2.1 million FFS records collected between 1990 and 2008. These records were distributed among 183,165 GBIF grid cells (i.e., surveys), 3660 BBS grid cells, and 3,372 FFS grid cells. Since 1990, in excess of 1.9 million GBIF records, 3.0 million BBS records, and 2.1 million FFS records have been accumulated. These contemporary records were distributed among 75,836 GBIF surveys, 3523 BBS surveys, and 3372 FFS surveys (Fig. 1). For the complete time period, plant surveys from GBIF were most prevalent, followed by GBIF mammals, GBIF insects, and GBIF birds. The least prevalent surveys were GBIF crayfish, FFS fish, and BBS birds (Table 1, Fig. 2). Surveys from standardized datasets (i.e., BBS and FFS) were substantially more complete than those from GBIF. Specifically, 4.7% and 3.7% of GBIF‐surveyed grid cells for the complete and contemporary time periods, respectively, were classified as well‐surveyed based on the moderate or high completeness thresholds. By contrast, 82.6% and 82.3% of BBS‐ and FFS‐surveyed grid cells for the complete and contemporary time periods, respectively, were classified as well‐surveyed (Table 1, Fig. 2).


Filling in the GAPS : evaluating completeness and coverage of open ‐ access biodiversity databases in the United States
Distribution of all surveyed grid cells and well‐surveyed grid cells throughout the contiguous United States during the contemporary time period (1990–2013) derived from three open‐access biodiversity databases representing ten taxonomic groups. Note that the square symbols are enlarged (i.e., larger than actual grid cell area) to facilitate visualization of well‐surveyed regions.
© Copyright Policy - creativeCommonsBy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4979697&req=5

ece32225-fig-0001: Distribution of all surveyed grid cells and well‐surveyed grid cells throughout the contiguous United States during the contemporary time period (1990–2013) derived from three open‐access biodiversity databases representing ten taxonomic groups. Note that the square symbols are enlarged (i.e., larger than actual grid cell area) to facilitate visualization of well‐surveyed regions.
Mentions: Our compilation of open‐access biodiversity data within the contiguous United States yielded in excess of 6.7 million GBIF records collected between 1800 and 2013, 4.8 million BBS records collected between 1963 and 2013, and 2.1 million FFS records collected between 1990 and 2008. These records were distributed among 183,165 GBIF grid cells (i.e., surveys), 3660 BBS grid cells, and 3,372 FFS grid cells. Since 1990, in excess of 1.9 million GBIF records, 3.0 million BBS records, and 2.1 million FFS records have been accumulated. These contemporary records were distributed among 75,836 GBIF surveys, 3523 BBS surveys, and 3372 FFS surveys (Fig. 1). For the complete time period, plant surveys from GBIF were most prevalent, followed by GBIF mammals, GBIF insects, and GBIF birds. The least prevalent surveys were GBIF crayfish, FFS fish, and BBS birds (Table 1, Fig. 2). Surveys from standardized datasets (i.e., BBS and FFS) were substantially more complete than those from GBIF. Specifically, 4.7% and 3.7% of GBIF‐surveyed grid cells for the complete and contemporary time periods, respectively, were classified as well‐surveyed based on the moderate or high completeness thresholds. By contrast, 82.6% and 82.3% of BBS‐ and FFS‐surveyed grid cells for the complete and contemporary time periods, respectively, were classified as well‐surveyed (Table 1, Fig. 2).

View Article: PubMed Central - PubMed

ABSTRACT

Primary biodiversity data constitute observations of particular species at given points in time and space. Open‐access electronic databases provide unprecedented access to these data, but their usefulness in characterizing species distributions and patterns in biodiversity depend on how complete species inventories are at a given survey location and how uniformly distributed survey locations are along dimensions of time, space, and environment. Our aim was to compare completeness and coverage among three open‐access databases representing ten taxonomic groups (amphibians, birds, freshwater bivalves, crayfish, freshwater fish, fungi, insects, mammals, plants, and reptiles) in the contiguous United States. We compiled occurrence records from the Global Biodiversity Information Facility (GBIF), the North American Breeding Bird Survey (BBS), and federally administered fish surveys (FFS). We aggregated occurrence records by 0.1° × 0.1° grid cells and computed three completeness metrics to classify each grid cell as well‐surveyed or not. Next, we compared frequency distributions of surveyed grid cells to background environmental conditions in a GIS and performed Kolmogorov–Smirnov tests to quantify coverage through time, along two spatial gradients, and along eight environmental gradients. The three databases contributed >13.6 million reliable occurrence records distributed among >190,000 grid cells. The percent of well‐surveyed grid cells was substantially lower for GBIF (5.2%) than for systematic surveys (BBS and FFS; 82.5%). Still, the large number of GBIF occurrence records produced at least 250 well‐surveyed grid cells for six of nine taxonomic groups. Coverages of systematic surveys were less biased across spatial and environmental dimensions but were more biased in temporal coverage compared to GBIF data. GBIF coverages also varied among taxonomic groups, consistent with commonly recognized geographic, environmental, and institutional sampling biases. This comprehensive assessment of biodiversity data across the contiguous United States provides a prioritization scheme to fill in the gaps by contributing existing occurrence records to the public domain and planning future surveys.

No MeSH data available.