NrichD database: sequence databases enriched with computationally designed protein-like sequences aid in remote homology detection.
Bottom Line: NrichD database currently contains 3,611,010 artificial sequences that have been generated between 27,882 pairs of families from 374 SCOP folds.The data sets are freely available for download.Additional features include the design of artificial sequences between any two protein families of interest to the user.
Affiliation: IISc Mathematics Initiative, Indian Institute of Science, Bangalore 560 012, Karnataka, India.Show MeSH
Related in: MedlinePlus
Mentions: In general, it was observed that the number of designed sequences for each SCOP fold was directly proportional to the number of associated protein families within the fold. Thus, we measured the success rate of designed sequences based on how many possible pairs of families could, in theory, be aligned for the fold and how many of the folds among them had qualified sequences. Out of the total estimated possible 44 675 pairs of families, we could design sequences between 27 882 pairs of families. Details of success rates observed in 374 folds are shown in Figure 1(a). About 88% of the total number of folds, have a success rate of more than 50%, i.e. for these folds, at least half of the total theoretically possible pairs of families resulted in designed intermediate sequences.
Affiliation: IISc Mathematics Initiative, Indian Institute of Science, Bangalore 560 012, Karnataka, India.