Limits...
Relationship between gene duplicability and diversifiability in the topology of biochemical networks.

Guo Z, Jiang W, Lages N, Borcherds W, Wang D - BMC Genomics (2014)

Bottom Line: Quantitatively, the number of genes (P(K)) with K duplicates in a genome decreases precipitously as K increases, and often follows a power law (P(k)∝k-α).Duplicate count (K) of a gene is negatively correlated to the clustering coefficient of its duplicates, suggesting that gene duplicability is related to the extent of sequence divergence within the duplicate gene family.Thus, a positive correlation exists between gene diversifiability and duplicability in the context of biochemical networks - an improvement of our understanding of gene duplicability.

View Article: PubMed Central - PubMed

Affiliation: Greehey Children's Cancer Research Institute, University of Texas Health Science Center at San Antonio, 8403 Floyd Curl Drive, San Antonio, TX 78229-3900, USA. wangd4@uthscsa.edu.

ABSTRACT

Background: Selective gene duplicability, the extensive expansion of a small number of gene families, is universal. Quantitatively, the number of genes (P(K)) with K duplicates in a genome decreases precipitously as K increases, and often follows a power law (P(k)∝k-α). Functional diversification, either neo- or sub-functionalization, is a major evolution route for duplicate genes.

Results: Using three lines of genomic datasets, we studied the relationship between gene duplicability and diversifiability in the topology of biochemical networks. First, we explored scenario where two pathways in the biochemical networks antagonize each other. Synthetic knockout of respective genes for the two pathways rescues the phenotypic defects of each individual knockout. We identified duplicate gene pairs with sufficient divergences that represent this antagonism relationship in the yeast S. cerevisiae. Such pairs overwhelmingly belong to large gene families, thus tend to have high duplicability. Second, we used distances between proteins of duplicate genes in the protein interaction network as a metric of their diversification. The higher a gene's duplicate count, the further the proteins of this gene and its duplicates drift away from one another in the networks, which is especially true for genetically antagonizing duplicate genes. Third, we computed a sequence-homology-based clustering coefficient to quantify sequence diversifiability among duplicate genes - the lower the coefficient, the more the sequences have diverged. Duplicate count (K) of a gene is negatively correlated to the clustering coefficient of its duplicates, suggesting that gene duplicability is related to the extent of sequence divergence within the duplicate gene family.

Conclusion: Thus, a positive correlation exists between gene diversifiability and duplicability in the context of biochemical networks - an improvement of our understanding of gene duplicability.

Show MeSH

Related in: MedlinePlus

Log-log plot of the numbers of protein-coding genes P(K)with duplicability K vs. K in the yeastS. cerevisiaeand human. As described in Materials and Methods, duplicability K of a gene was calculated as BLAST hit count of its protein in an all-against-all BLAST, with a threshold BLAST E-value of 10−30. Linear relationships were observed, indicating power-law relationship between P(K) and K (P(K) ∝ K-α). And the slopes of the linear relationships – that is, the α values – were different between yeast and human. To better illustrate this difference, S. cerevisiae data points were shifted upward to overlap the leftmost data points of the two species.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4129122&req=5

Fig1: Log-log plot of the numbers of protein-coding genes P(K)with duplicability K vs. K in the yeastS. cerevisiaeand human. As described in Materials and Methods, duplicability K of a gene was calculated as BLAST hit count of its protein in an all-against-all BLAST, with a threshold BLAST E-value of 10−30. Linear relationships were observed, indicating power-law relationship between P(K) and K (P(K) ∝ K-α). And the slopes of the linear relationships – that is, the α values – were different between yeast and human. To better illustrate this difference, S. cerevisiae data points were shifted upward to overlap the leftmost data points of the two species.

Mentions: As the goal of this study is to detect potential relationship between gene duplicability and diversifiability, it is necessary to measure gene duplicability. We performed respective all-against-all BLAST for protein sequences encoded in the yeast S. cerevisiae and the human genome, with a threshold E-value of 10−30. BLAST hit count (K) was calculated for each protein. We used the value of K as a quantifier of duplicability of the corresponding gene – the higher the value of K, the higher the duplicability. As previously done for the S. cerevisiae and the C. elegans proteomes [22], we created the log-log plot of the number of proteins with K BLAST hits (P(K)) vs. K for the human proteome. A linear relationship between log(P(K)) and log(K) was observed (Figure 1). This indicates a power-law relationship – P(k) ∝ k-α with the exponent constant α being the slope of the linear relationship. Moreover, the decrease of log(P(K)) as log(K) increased was slower in the human proteome than in the yeast proteome, i.e., a lower α value of the power-law relationship, reflecting higher duplicate gene abundance in multicellular genome.Figure 1


Relationship between gene duplicability and diversifiability in the topology of biochemical networks.

Guo Z, Jiang W, Lages N, Borcherds W, Wang D - BMC Genomics (2014)

Log-log plot of the numbers of protein-coding genes P(K)with duplicability K vs. K in the yeastS. cerevisiaeand human. As described in Materials and Methods, duplicability K of a gene was calculated as BLAST hit count of its protein in an all-against-all BLAST, with a threshold BLAST E-value of 10−30. Linear relationships were observed, indicating power-law relationship between P(K) and K (P(K) ∝ K-α). And the slopes of the linear relationships – that is, the α values – were different between yeast and human. To better illustrate this difference, S. cerevisiae data points were shifted upward to overlap the leftmost data points of the two species.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4129122&req=5

Fig1: Log-log plot of the numbers of protein-coding genes P(K)with duplicability K vs. K in the yeastS. cerevisiaeand human. As described in Materials and Methods, duplicability K of a gene was calculated as BLAST hit count of its protein in an all-against-all BLAST, with a threshold BLAST E-value of 10−30. Linear relationships were observed, indicating power-law relationship between P(K) and K (P(K) ∝ K-α). And the slopes of the linear relationships – that is, the α values – were different between yeast and human. To better illustrate this difference, S. cerevisiae data points were shifted upward to overlap the leftmost data points of the two species.
Mentions: As the goal of this study is to detect potential relationship between gene duplicability and diversifiability, it is necessary to measure gene duplicability. We performed respective all-against-all BLAST for protein sequences encoded in the yeast S. cerevisiae and the human genome, with a threshold E-value of 10−30. BLAST hit count (K) was calculated for each protein. We used the value of K as a quantifier of duplicability of the corresponding gene – the higher the value of K, the higher the duplicability. As previously done for the S. cerevisiae and the C. elegans proteomes [22], we created the log-log plot of the number of proteins with K BLAST hits (P(K)) vs. K for the human proteome. A linear relationship between log(P(K)) and log(K) was observed (Figure 1). This indicates a power-law relationship – P(k) ∝ k-α with the exponent constant α being the slope of the linear relationship. Moreover, the decrease of log(P(K)) as log(K) increased was slower in the human proteome than in the yeast proteome, i.e., a lower α value of the power-law relationship, reflecting higher duplicate gene abundance in multicellular genome.Figure 1

Bottom Line: Quantitatively, the number of genes (P(K)) with K duplicates in a genome decreases precipitously as K increases, and often follows a power law (P(k)∝k-α).Duplicate count (K) of a gene is negatively correlated to the clustering coefficient of its duplicates, suggesting that gene duplicability is related to the extent of sequence divergence within the duplicate gene family.Thus, a positive correlation exists between gene diversifiability and duplicability in the context of biochemical networks - an improvement of our understanding of gene duplicability.

View Article: PubMed Central - PubMed

Affiliation: Greehey Children's Cancer Research Institute, University of Texas Health Science Center at San Antonio, 8403 Floyd Curl Drive, San Antonio, TX 78229-3900, USA. wangd4@uthscsa.edu.

ABSTRACT

Background: Selective gene duplicability, the extensive expansion of a small number of gene families, is universal. Quantitatively, the number of genes (P(K)) with K duplicates in a genome decreases precipitously as K increases, and often follows a power law (P(k)∝k-α). Functional diversification, either neo- or sub-functionalization, is a major evolution route for duplicate genes.

Results: Using three lines of genomic datasets, we studied the relationship between gene duplicability and diversifiability in the topology of biochemical networks. First, we explored scenario where two pathways in the biochemical networks antagonize each other. Synthetic knockout of respective genes for the two pathways rescues the phenotypic defects of each individual knockout. We identified duplicate gene pairs with sufficient divergences that represent this antagonism relationship in the yeast S. cerevisiae. Such pairs overwhelmingly belong to large gene families, thus tend to have high duplicability. Second, we used distances between proteins of duplicate genes in the protein interaction network as a metric of their diversification. The higher a gene's duplicate count, the further the proteins of this gene and its duplicates drift away from one another in the networks, which is especially true for genetically antagonizing duplicate genes. Third, we computed a sequence-homology-based clustering coefficient to quantify sequence diversifiability among duplicate genes - the lower the coefficient, the more the sequences have diverged. Duplicate count (K) of a gene is negatively correlated to the clustering coefficient of its duplicates, suggesting that gene duplicability is related to the extent of sequence divergence within the duplicate gene family.

Conclusion: Thus, a positive correlation exists between gene diversifiability and duplicability in the context of biochemical networks - an improvement of our understanding of gene duplicability.

Show MeSH
Related in: MedlinePlus