LncRNAWiki: harnessing community knowledge in collaborative curation of human long non-coding RNAs.
Bottom Line: Unlike existing databases, lncRNAWiki features comprehensive integration of information on human lncRNAs obtained from multiple different resources and allows not only existing lncRNAs to be edited, updated and curated by different users but also the addition of newly identified lncRNAs by any user.It harnesses community collective knowledge in collecting, editing and annotating human lncRNAs and rewards community-curated efforts by providing explicit authorship based on quantified contributions.LncRNAWiki relies on the underling knowledge of scientific community for collective and collaborative curation of human lncRNAs and thus has the potential to serve as an up-to-date and comprehensive knowledgebase for human lncRNAs.
Affiliation: CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China.Show MeSH
Mentions: We integrated lncRNA sequences and annotation information (e.g. genomic location, transcript structure) from three data sources: GENCODE (version 19; 23 898 human lncRNA transcripts), NONCODE (version 4.0; 95 135 human lncRNA transcripts) and LNCipedia (version 2.1; 32 181 human lncRNA transcripts). A process of error and redundancy elimination was performed on the integrated data set. First, we removed sequences containing ‘N’ in each data source, and as a result, a total of eight lncRNAs in LNCipedia were removed. Second, we excluded lncRNAs with ambiguous naming scheme; in each data source, two or more lncRNA transcripts having 100% sequence identity on the whole transcript length (based on blastn results) and occupying the same genomic location but having different IDs are considered as questionable lncRNAs. Consequently, 14, 20 and eight lncRNAs were removed from GENCODE, NONCODE and LNCipedia, respectively. Lastly, since different databases may have different naming schemes and a given lncRNA transcript may accordingly have different identifiers in different databases, we performed blastn across these three data sources. LncRNA transcripts having 100% sequence identity (based on blastn results) and occupying the same genomic location were regarded as the same lncRNA. Finally, we obtained a total of 105 255 non-redundant lncRNA transcripts (Figure 1). We also blasted these 105 255 lncRNAs against lncRNA sequences in lncRNAdb (223 lncRNAs in total as of July 21, 2014) and found only 103 lncRNAs have been functionally annotated (Supplementary Table S1), indicating that a large number of human lncRNAs are poorly annotated and need a platform for community annotation of lncRNAs.
Affiliation: CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China.