Limits...
B-HIT - A Tool for Harvesting and Indexing Biodiversity Data.

Kelbert P, Droege G, Barker K, Braak K, Cawsey EM, Coddington J, Robertson T, Whitacre J, Güntsch A - PLoS ONE (2015)

Bottom Line: With the rapidly growing number of data publishers, the process of harvesting and indexing information to offer advanced search and discovery becomes a critical bottleneck in globally distributed primary biodiversity data infrastructures.The team of the Botanic Garden and Botanical Museum Berlin-Dahlem has extended this well-established system with a range of additional functions, including improved processing of multiple taxon identifications, the ability to represent associations between specimen and observation units, new data quality control and new reporting capabilities.The open source software B-HIT can be freely installed and used for setting up thematic networks serving the demands of particular user groups.

View Article: PubMed Central - PubMed

Affiliation: Botanic Garden and Botanical Museum Berlin-Dahlem, Freie Universität Berlin, Berlin, Germany.

ABSTRACT
With the rapidly growing number of data publishers, the process of harvesting and indexing information to offer advanced search and discovery becomes a critical bottleneck in globally distributed primary biodiversity data infrastructures. The Global Biodiversity Information Facility (GBIF) implemented a Harvesting and Indexing Toolkit (HIT), which largely automates data harvesting activities for hundreds of collection and observational data providers. The team of the Botanic Garden and Botanical Museum Berlin-Dahlem has extended this well-established system with a range of additional functions, including improved processing of multiple taxon identifications, the ability to represent associations between specimen and observation units, new data quality control and new reporting capabilities. The open source software B-HIT can be freely installed and used for setting up thematic networks serving the demands of particular user groups.

No MeSH data available.


Web interface of B-HIT.This extended user-interface makes it possible to gain access to the new functionalities (i.e. Associated Datasource Harvesting, Data quality, Datasource Management) through a series of tabs.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4636251&req=5

pone.0142240.g004: Web interface of B-HIT.This extended user-interface makes it possible to gain access to the new functionalities (i.e. Associated Datasource Harvesting, Data quality, Datasource Management) through a series of tabs.

Mentions: Every triple ID occurring in a record (both main triple ID and associated triple ID), as well as their relations, are stored. This enables portal developers to get information on all parent and grandparent entries for a single record. B-HIT checks every associated triple ID for its existence and availability at a respective provider. B-HIT is capable of preparing the data source metadata—such as name, access point, collection code, institution code—based on the relationship information stored by the main dataset. A new tab has been added into the GUI, dedicated to a user-friendly handling of this special category of datasets (Fig 4). Missing associations are checked: if some records should be linked to external or internal datasets, B-HIT will automatically look for the presence of these associated datasets and the corresponding records in the database. Specific functions are set for the associated data sources, such as harvesting the list of missing units only and processing these units; harvesting their sibling units and processing them. If associated datasets or units are still missing after these operations are run, the main data source will be associated with a special mark on the overview.


B-HIT - A Tool for Harvesting and Indexing Biodiversity Data.

Kelbert P, Droege G, Barker K, Braak K, Cawsey EM, Coddington J, Robertson T, Whitacre J, Güntsch A - PLoS ONE (2015)

Web interface of B-HIT.This extended user-interface makes it possible to gain access to the new functionalities (i.e. Associated Datasource Harvesting, Data quality, Datasource Management) through a series of tabs.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4636251&req=5

pone.0142240.g004: Web interface of B-HIT.This extended user-interface makes it possible to gain access to the new functionalities (i.e. Associated Datasource Harvesting, Data quality, Datasource Management) through a series of tabs.
Mentions: Every triple ID occurring in a record (both main triple ID and associated triple ID), as well as their relations, are stored. This enables portal developers to get information on all parent and grandparent entries for a single record. B-HIT checks every associated triple ID for its existence and availability at a respective provider. B-HIT is capable of preparing the data source metadata—such as name, access point, collection code, institution code—based on the relationship information stored by the main dataset. A new tab has been added into the GUI, dedicated to a user-friendly handling of this special category of datasets (Fig 4). Missing associations are checked: if some records should be linked to external or internal datasets, B-HIT will automatically look for the presence of these associated datasets and the corresponding records in the database. Specific functions are set for the associated data sources, such as harvesting the list of missing units only and processing these units; harvesting their sibling units and processing them. If associated datasets or units are still missing after these operations are run, the main data source will be associated with a special mark on the overview.

Bottom Line: With the rapidly growing number of data publishers, the process of harvesting and indexing information to offer advanced search and discovery becomes a critical bottleneck in globally distributed primary biodiversity data infrastructures.The team of the Botanic Garden and Botanical Museum Berlin-Dahlem has extended this well-established system with a range of additional functions, including improved processing of multiple taxon identifications, the ability to represent associations between specimen and observation units, new data quality control and new reporting capabilities.The open source software B-HIT can be freely installed and used for setting up thematic networks serving the demands of particular user groups.

View Article: PubMed Central - PubMed

Affiliation: Botanic Garden and Botanical Museum Berlin-Dahlem, Freie Universität Berlin, Berlin, Germany.

ABSTRACT
With the rapidly growing number of data publishers, the process of harvesting and indexing information to offer advanced search and discovery becomes a critical bottleneck in globally distributed primary biodiversity data infrastructures. The Global Biodiversity Information Facility (GBIF) implemented a Harvesting and Indexing Toolkit (HIT), which largely automates data harvesting activities for hundreds of collection and observational data providers. The team of the Botanic Garden and Botanical Museum Berlin-Dahlem has extended this well-established system with a range of additional functions, including improved processing of multiple taxon identifications, the ability to represent associations between specimen and observation units, new data quality control and new reporting capabilities. The open source software B-HIT can be freely installed and used for setting up thematic networks serving the demands of particular user groups.

No MeSH data available.