jmzReader: A Java parser library to process and visualize multiple text and XML-based mass spectrometry data formats.
Bottom Line: The library is optimized to be used in conjunction with mzIdentML, the recently released standard data format for reporting protein and peptide identifications, developed by the HUPO proteomics standards initiative (PSI). mzIdentML files do not contain spectra data but contain references to different kinds of external MS data files.As a key functionality, all parsers implement a common interface that supports the various methods used by mzIdentML to reference external spectra.Thus, when developing software for mzIdentML, programmers no longer have to support multiple MS data file formats but only this one interface.
Affiliation: EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.Show MeSH
Mentions: Although the adoption of the community standard format, mzML, for representation of MS data is progressing, an efficient and consistent parsing library for vendor-specific file formats, both for legacy data and current data in other formats will still be required for the foreseeable future. We, here, present the jmzReader library: a collection of Java APIs to efficiently process a multitude of MS data formats optimized for the usage with the mzIdentML standard. The jmzReader library currently consists of six independently usable Java APIs: dta-parser, mgf-parser, ms2-parser, mzdata-parser, mzxml-parser, and pkl-parser as well as a wrapper class around the existing jmzML API  (see Supporting Information File S1). All of these APIs implement a common interface and were developed based on the file format descriptions found on the Mascot documentation page (http://www.matrixscience.com/help/data_file_help.html) as well as available publications [8, 9] and file format specific project documentations (http://tools.proteomecenter.org/wiki/index.php?title=Formats:mzXML) for mzXML (supporting versions 2.1–3.2) and http://www.psidev.info/index.php?q=node/80#mzdata for mzData (supporting version 1.05). The mzml-wrapper was built around the existing jmzML API implementing the jmzReader interface and thereby adding mzML support to the jmzReader library for convenience purposes. Through the common jmzReader interface, programmers writing support for peak lists referenced in mzIdentML only have to support one interface to access any of the supported seven file formats (Fig. 1). In addition, each of the parsers contains a format specific Java object model that allows access to the whole information contained in the specific file format. Thus, the various parser APIs are not limited to the use with mzIdentML but form a solid basis for any software processing MS data. All APIs are open source, were written in 100% Java and are thus inherently platform independent.
Affiliation: EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.