Limits...
An algorithm for identifying novel targets of transcription factor families: application to hypoxia-inducible factor 1 targets.

Jiang Y, Cukic B, Adjeroh DA, Skinner HD, Lin J, Shen QJ, Jiang BH - Cancer Inform (2009)

Bottom Line: Efficient and effective analysis of the growing genomic databases requires the development of adequate computational tools.We further studied one of the potential targets, COX-2, in the biological lab; and showed that it was a biologically relevant HIF-1 target.These results demonstrate that our methodology is an effective computational approach for identifying novel HIF-1 targets.

View Article: PubMed Central - PubMed

Affiliation: Lane Department of Computer Science and Electrical Engineering, West Virginia University, Morgantown, WV 26506, USA. yue@csee.wvu.edu

ABSTRACT
Efficient and effective analysis of the growing genomic databases requires the development of adequate computational tools. We introduce a fast method based on the suffix tree data structure for predicting novel targets of hypoxia-inducible factor 1 (HIF-1) from huge genome databases. The suffix tree data structure has two powerful applications here: one is to extract unknown patterns from multiple strings/sequences in linear time; the other is to search multiple strings/sequences using multiple patterns in linear time. Using 15 known HIF-1 target gene sequences as a training set, we extracted 105 common patterns that all occur in the 15 training genes using suffix trees. Using these 105 common patterns along with known subsequences surrounding HIF-1 binding sites from the literature, the algorithm searches a genome database that contains 2,078,786 DNA sequences. It reported 258 potentially novel HIF-1 targets including 25 known HIF-1 targets. Based on microarray studies from the literature, 17 putative genes were confirmed to be upregulated by HIF-1 or hypoxia inside these 258 genes. We further studied one of the potential targets, COX-2, in the biological lab; and showed that it was a biologically relevant HIF-1 target. These results demonstrate that our methodology is an effective computational approach for identifying novel HIF-1 targets.

No MeSH data available.


Related in: MedlinePlus

The outline of general methodology. The training genes of known HIF-1 targets are built into a suffix tree, and a set of common patterns are extracted from the suffix tree. Common patterns (including the set of common patterns and consensus sequences) are used to search the human genome database using the suffix tree algorithm. Using positional analysis, we analyze the output genes according to the relative locations of HIF-1 binding sites in the genes, and define the output genes with HIF-1 binding sites upstream of translational start site as potential HIF-1 targets. The potential HIF-1 targets are divided into two groups, known HIF-1 target genes and the candidate target genes. Finally, the candidate novel target genes are validated using available microarray data in the literature and tested in the biological lab.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC2664698&req=5

f1-cin-07-75: The outline of general methodology. The training genes of known HIF-1 targets are built into a suffix tree, and a set of common patterns are extracted from the suffix tree. Common patterns (including the set of common patterns and consensus sequences) are used to search the human genome database using the suffix tree algorithm. Using positional analysis, we analyze the output genes according to the relative locations of HIF-1 binding sites in the genes, and define the output genes with HIF-1 binding sites upstream of translational start site as potential HIF-1 targets. The potential HIF-1 targets are divided into two groups, known HIF-1 target genes and the candidate target genes. Finally, the candidate novel target genes are validated using available microarray data in the literature and tested in the biological lab.

Mentions: The general methodology used in this study is illustrated in Figure 1. In brief, 1) A suffix tree is constructed using the set of training genes. A set of common patterns that occur on all training genes at least once is extracted from the suffix tree. 2) Using the multiple patterns (including the common patterns from the previous step and other known patterns such as HIF-1 binding sites (see Table 2) and consensus sequences from the literature, the genome database is searched by applying suffix tree algorithms. This generates the output sequences. 3) Positional analysis is performed on each output sequence according to the functional DNA fragments at the specific locations of the sequence. 4) The output targets from the positional analysis are grouped into known target genes and candidate targets. 5) The candidate target genes are further verified by doing biological experiments in the laboratory and by using available microarray data in the literature.


An algorithm for identifying novel targets of transcription factor families: application to hypoxia-inducible factor 1 targets.

Jiang Y, Cukic B, Adjeroh DA, Skinner HD, Lin J, Shen QJ, Jiang BH - Cancer Inform (2009)

The outline of general methodology. The training genes of known HIF-1 targets are built into a suffix tree, and a set of common patterns are extracted from the suffix tree. Common patterns (including the set of common patterns and consensus sequences) are used to search the human genome database using the suffix tree algorithm. Using positional analysis, we analyze the output genes according to the relative locations of HIF-1 binding sites in the genes, and define the output genes with HIF-1 binding sites upstream of translational start site as potential HIF-1 targets. The potential HIF-1 targets are divided into two groups, known HIF-1 target genes and the candidate target genes. Finally, the candidate novel target genes are validated using available microarray data in the literature and tested in the biological lab.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC2664698&req=5

f1-cin-07-75: The outline of general methodology. The training genes of known HIF-1 targets are built into a suffix tree, and a set of common patterns are extracted from the suffix tree. Common patterns (including the set of common patterns and consensus sequences) are used to search the human genome database using the suffix tree algorithm. Using positional analysis, we analyze the output genes according to the relative locations of HIF-1 binding sites in the genes, and define the output genes with HIF-1 binding sites upstream of translational start site as potential HIF-1 targets. The potential HIF-1 targets are divided into two groups, known HIF-1 target genes and the candidate target genes. Finally, the candidate novel target genes are validated using available microarray data in the literature and tested in the biological lab.
Mentions: The general methodology used in this study is illustrated in Figure 1. In brief, 1) A suffix tree is constructed using the set of training genes. A set of common patterns that occur on all training genes at least once is extracted from the suffix tree. 2) Using the multiple patterns (including the common patterns from the previous step and other known patterns such as HIF-1 binding sites (see Table 2) and consensus sequences from the literature, the genome database is searched by applying suffix tree algorithms. This generates the output sequences. 3) Positional analysis is performed on each output sequence according to the functional DNA fragments at the specific locations of the sequence. 4) The output targets from the positional analysis are grouped into known target genes and candidate targets. 5) The candidate target genes are further verified by doing biological experiments in the laboratory and by using available microarray data in the literature.

Bottom Line: Efficient and effective analysis of the growing genomic databases requires the development of adequate computational tools.We further studied one of the potential targets, COX-2, in the biological lab; and showed that it was a biologically relevant HIF-1 target.These results demonstrate that our methodology is an effective computational approach for identifying novel HIF-1 targets.

View Article: PubMed Central - PubMed

Affiliation: Lane Department of Computer Science and Electrical Engineering, West Virginia University, Morgantown, WV 26506, USA. yue@csee.wvu.edu

ABSTRACT
Efficient and effective analysis of the growing genomic databases requires the development of adequate computational tools. We introduce a fast method based on the suffix tree data structure for predicting novel targets of hypoxia-inducible factor 1 (HIF-1) from huge genome databases. The suffix tree data structure has two powerful applications here: one is to extract unknown patterns from multiple strings/sequences in linear time; the other is to search multiple strings/sequences using multiple patterns in linear time. Using 15 known HIF-1 target gene sequences as a training set, we extracted 105 common patterns that all occur in the 15 training genes using suffix trees. Using these 105 common patterns along with known subsequences surrounding HIF-1 binding sites from the literature, the algorithm searches a genome database that contains 2,078,786 DNA sequences. It reported 258 potentially novel HIF-1 targets including 25 known HIF-1 targets. Based on microarray studies from the literature, 17 putative genes were confirmed to be upregulated by HIF-1 or hypoxia inside these 258 genes. We further studied one of the potential targets, COX-2, in the biological lab; and showed that it was a biologically relevant HIF-1 target. These results demonstrate that our methodology is an effective computational approach for identifying novel HIF-1 targets.

No MeSH data available.


Related in: MedlinePlus