Limits...
B-HIT - A Tool for Harvesting and Indexing Biodiversity Data.

Kelbert P, Droege G, Barker K, Braak K, Cawsey EM, Coddington J, Robertson T, Whitacre J, Güntsch A - PLoS ONE (2015)

Bottom Line: With the rapidly growing number of data publishers, the process of harvesting and indexing information to offer advanced search and discovery becomes a critical bottleneck in globally distributed primary biodiversity data infrastructures.The team of the Botanic Garden and Botanical Museum Berlin-Dahlem has extended this well-established system with a range of additional functions, including improved processing of multiple taxon identifications, the ability to represent associations between specimen and observation units, new data quality control and new reporting capabilities.The open source software B-HIT can be freely installed and used for setting up thematic networks serving the demands of particular user groups.

View Article: PubMed Central - PubMed

Affiliation: Botanic Garden and Botanical Museum Berlin-Dahlem, Freie Universität Berlin, Berlin, Germany.

ABSTRACT
With the rapidly growing number of data publishers, the process of harvesting and indexing information to offer advanced search and discovery becomes a critical bottleneck in globally distributed primary biodiversity data infrastructures. The Global Biodiversity Information Facility (GBIF) implemented a Harvesting and Indexing Toolkit (HIT), which largely automates data harvesting activities for hundreds of collection and observational data providers. The team of the Botanic Garden and Botanical Museum Berlin-Dahlem has extended this well-established system with a range of additional functions, including improved processing of multiple taxon identifications, the ability to represent associations between specimen and observation units, new data quality control and new reporting capabilities. The open source software B-HIT can be freely installed and used for setting up thematic networks serving the demands of particular user groups.

No MeSH data available.


Principal model of raw and improved data in the B-HIT database.Corresponding raw and improved table (i.e. identification from the provider and improved identification) are linked through a 1–1 relation. Multiple identifications can be associated to a single record and are therefore linked through a 1-n relation with the (raw) occurrence table.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4636251&req=5

pone.0142240.g002: Principal model of raw and improved data in the B-HIT database.Corresponding raw and improved table (i.e. identification from the provider and improved identification) are linked through a 1–1 relation. Multiple identifications can be associated to a single record and are therefore linked through a 1-n relation with the (raw) occurrence table.

Mentions: The database schema underwent significant extension and adaptation to support several 1-to-many relationships within a single record, such as multiple identifications, multimedia urls and associations. In its current version (October 2015), the database consists of 37 tables, divided into a raw data block and an improved data block (Fig 2). All original values harvested from providers are kept and stored. In addition, the improved data block comprises tables with similar structure, but holding cleaned and improved data after quality control. This principal architecture of B-HIT also allows data portal developers to choose between raw and cleaned data for search and display issues.


B-HIT - A Tool for Harvesting and Indexing Biodiversity Data.

Kelbert P, Droege G, Barker K, Braak K, Cawsey EM, Coddington J, Robertson T, Whitacre J, Güntsch A - PLoS ONE (2015)

Principal model of raw and improved data in the B-HIT database.Corresponding raw and improved table (i.e. identification from the provider and improved identification) are linked through a 1–1 relation. Multiple identifications can be associated to a single record and are therefore linked through a 1-n relation with the (raw) occurrence table.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4636251&req=5

pone.0142240.g002: Principal model of raw and improved data in the B-HIT database.Corresponding raw and improved table (i.e. identification from the provider and improved identification) are linked through a 1–1 relation. Multiple identifications can be associated to a single record and are therefore linked through a 1-n relation with the (raw) occurrence table.
Mentions: The database schema underwent significant extension and adaptation to support several 1-to-many relationships within a single record, such as multiple identifications, multimedia urls and associations. In its current version (October 2015), the database consists of 37 tables, divided into a raw data block and an improved data block (Fig 2). All original values harvested from providers are kept and stored. In addition, the improved data block comprises tables with similar structure, but holding cleaned and improved data after quality control. This principal architecture of B-HIT also allows data portal developers to choose between raw and cleaned data for search and display issues.

Bottom Line: With the rapidly growing number of data publishers, the process of harvesting and indexing information to offer advanced search and discovery becomes a critical bottleneck in globally distributed primary biodiversity data infrastructures.The team of the Botanic Garden and Botanical Museum Berlin-Dahlem has extended this well-established system with a range of additional functions, including improved processing of multiple taxon identifications, the ability to represent associations between specimen and observation units, new data quality control and new reporting capabilities.The open source software B-HIT can be freely installed and used for setting up thematic networks serving the demands of particular user groups.

View Article: PubMed Central - PubMed

Affiliation: Botanic Garden and Botanical Museum Berlin-Dahlem, Freie Universität Berlin, Berlin, Germany.

ABSTRACT
With the rapidly growing number of data publishers, the process of harvesting and indexing information to offer advanced search and discovery becomes a critical bottleneck in globally distributed primary biodiversity data infrastructures. The Global Biodiversity Information Facility (GBIF) implemented a Harvesting and Indexing Toolkit (HIT), which largely automates data harvesting activities for hundreds of collection and observational data providers. The team of the Botanic Garden and Botanical Museum Berlin-Dahlem has extended this well-established system with a range of additional functions, including improved processing of multiple taxon identifications, the ability to represent associations between specimen and observation units, new data quality control and new reporting capabilities. The open source software B-HIT can be freely installed and used for setting up thematic networks serving the demands of particular user groups.

No MeSH data available.