Limits...
Relationship between gene duplicability and diversifiability in the topology of biochemical networks.

Guo Z, Jiang W, Lages N, Borcherds W, Wang D - BMC Genomics (2014)

Bottom Line: Quantitatively, the number of genes (P(K)) with K duplicates in a genome decreases precipitously as K increases, and often follows a power law (P(k)∝k-α).Duplicate count (K) of a gene is negatively correlated to the clustering coefficient of its duplicates, suggesting that gene duplicability is related to the extent of sequence divergence within the duplicate gene family.Thus, a positive correlation exists between gene diversifiability and duplicability in the context of biochemical networks - an improvement of our understanding of gene duplicability.

View Article: PubMed Central - PubMed

Affiliation: Greehey Children's Cancer Research Institute, University of Texas Health Science Center at San Antonio, 8403 Floyd Curl Drive, San Antonio, TX 78229-3900, USA. wangd4@uthscsa.edu.

ABSTRACT

Background: Selective gene duplicability, the extensive expansion of a small number of gene families, is universal. Quantitatively, the number of genes (P(K)) with K duplicates in a genome decreases precipitously as K increases, and often follows a power law (P(k)∝k-α). Functional diversification, either neo- or sub-functionalization, is a major evolution route for duplicate genes.

Results: Using three lines of genomic datasets, we studied the relationship between gene duplicability and diversifiability in the topology of biochemical networks. First, we explored scenario where two pathways in the biochemical networks antagonize each other. Synthetic knockout of respective genes for the two pathways rescues the phenotypic defects of each individual knockout. We identified duplicate gene pairs with sufficient divergences that represent this antagonism relationship in the yeast S. cerevisiae. Such pairs overwhelmingly belong to large gene families, thus tend to have high duplicability. Second, we used distances between proteins of duplicate genes in the protein interaction network as a metric of their diversification. The higher a gene's duplicate count, the further the proteins of this gene and its duplicates drift away from one another in the networks, which is especially true for genetically antagonizing duplicate genes. Third, we computed a sequence-homology-based clustering coefficient to quantify sequence diversifiability among duplicate genes - the lower the coefficient, the more the sequences have diverged. Duplicate count (K) of a gene is negatively correlated to the clustering coefficient of its duplicates, suggesting that gene duplicability is related to the extent of sequence divergence within the duplicate gene family.

Conclusion: Thus, a positive correlation exists between gene diversifiability and duplicability in the context of biochemical networks - an improvement of our understanding of gene duplicability.

Show MeSH

Related in: MedlinePlus

Comparison of duplicability (K) distributions within the two groups of duplicate genes: genes in genetically antagonizing (GA) duplicate pairs (dashed line and white circle) vs. those in genetically complementing (GC) pairs (Solid line and black square). A: log(P(K)) vs. log(K) plot of genes whose BLAST hits enclose both proteins of a corresponding duplicate gene pair. B: log(P(K)) vs. log(K) plot of genes in corresponding group of duplicate gene pairs. The gene pairs were identified as described in Materials and Methods. The horizontal axis is the logarithms of gene duplicability K, which, as described in Materials and Methods, was calculated as BLAST hit count of a gene’s protein. Vertical axis is the logarithms of P(K). Linear regression lines, regression equations and R2 values of the regression are shown. In both panels, the GC gene data (black squares) fit well into power-law relationships, whereas the GA gene data (white circles) fit, if at all, poorly.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4129122&req=5

Fig2: Comparison of duplicability (K) distributions within the two groups of duplicate genes: genes in genetically antagonizing (GA) duplicate pairs (dashed line and white circle) vs. those in genetically complementing (GC) pairs (Solid line and black square). A: log(P(K)) vs. log(K) plot of genes whose BLAST hits enclose both proteins of a corresponding duplicate gene pair. B: log(P(K)) vs. log(K) plot of genes in corresponding group of duplicate gene pairs. The gene pairs were identified as described in Materials and Methods. The horizontal axis is the logarithms of gene duplicability K, which, as described in Materials and Methods, was calculated as BLAST hit count of a gene’s protein. Vertical axis is the logarithms of P(K). Linear regression lines, regression equations and R2 values of the regression are shown. In both panels, the GC gene data (black squares) fit well into power-law relationships, whereas the GA gene data (white circles) fit, if at all, poorly.

Mentions: We first assessed whether genetically antagonizing (GA) duplicate gene pairs were more likely to belong to larger duplicate gene families than genetically complementing (GC) pairs. The approach was to collect, for each of the two groups of duplicate gene pairs, the set of genes whose BLAST hits enclose the proteins of both genes in a pair. We then determined which of the two sets of identified genes have higher K values, i.e., whether two mutually antagonizing or complementing duplicate genes tend to have their proteins co-occur in BLAST hits of genes with higher K values. The results are illustrated in the form of log-log plots in Figure 2A. The vertical axis represents the logarithms of percentage of genes, and the horizontal axis the logarithms of K values. A clear linear decay fit the log-log data well (with a R2 value of 0.78) in the case of GC pairs. The α value of the power-law relationship was 1.63. The log-log data of the GA duplicate gene pairs, on the other hand, fit very poorly (with a R2 value of 0.06) into a linear decay relationship. The power-law relationship, if it was at all, had a much lower α value of 0.35, indicating much slower decrease of the count of gene pairs as K increases. Thus, proteins of pairs of antagonizing duplicate genes tend to be grouped together in BLAST hits of the proteins of high duplicability genes. In other words, GC duplicate gene pairs tend to be associated with low duplicability genes and smaller gene families, whereas GA pairs are much more likely to be associated with high duplicability genes and larger gene families.Figure 2


Relationship between gene duplicability and diversifiability in the topology of biochemical networks.

Guo Z, Jiang W, Lages N, Borcherds W, Wang D - BMC Genomics (2014)

Comparison of duplicability (K) distributions within the two groups of duplicate genes: genes in genetically antagonizing (GA) duplicate pairs (dashed line and white circle) vs. those in genetically complementing (GC) pairs (Solid line and black square). A: log(P(K)) vs. log(K) plot of genes whose BLAST hits enclose both proteins of a corresponding duplicate gene pair. B: log(P(K)) vs. log(K) plot of genes in corresponding group of duplicate gene pairs. The gene pairs were identified as described in Materials and Methods. The horizontal axis is the logarithms of gene duplicability K, which, as described in Materials and Methods, was calculated as BLAST hit count of a gene’s protein. Vertical axis is the logarithms of P(K). Linear regression lines, regression equations and R2 values of the regression are shown. In both panels, the GC gene data (black squares) fit well into power-law relationships, whereas the GA gene data (white circles) fit, if at all, poorly.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4129122&req=5

Fig2: Comparison of duplicability (K) distributions within the two groups of duplicate genes: genes in genetically antagonizing (GA) duplicate pairs (dashed line and white circle) vs. those in genetically complementing (GC) pairs (Solid line and black square). A: log(P(K)) vs. log(K) plot of genes whose BLAST hits enclose both proteins of a corresponding duplicate gene pair. B: log(P(K)) vs. log(K) plot of genes in corresponding group of duplicate gene pairs. The gene pairs were identified as described in Materials and Methods. The horizontal axis is the logarithms of gene duplicability K, which, as described in Materials and Methods, was calculated as BLAST hit count of a gene’s protein. Vertical axis is the logarithms of P(K). Linear regression lines, regression equations and R2 values of the regression are shown. In both panels, the GC gene data (black squares) fit well into power-law relationships, whereas the GA gene data (white circles) fit, if at all, poorly.
Mentions: We first assessed whether genetically antagonizing (GA) duplicate gene pairs were more likely to belong to larger duplicate gene families than genetically complementing (GC) pairs. The approach was to collect, for each of the two groups of duplicate gene pairs, the set of genes whose BLAST hits enclose the proteins of both genes in a pair. We then determined which of the two sets of identified genes have higher K values, i.e., whether two mutually antagonizing or complementing duplicate genes tend to have their proteins co-occur in BLAST hits of genes with higher K values. The results are illustrated in the form of log-log plots in Figure 2A. The vertical axis represents the logarithms of percentage of genes, and the horizontal axis the logarithms of K values. A clear linear decay fit the log-log data well (with a R2 value of 0.78) in the case of GC pairs. The α value of the power-law relationship was 1.63. The log-log data of the GA duplicate gene pairs, on the other hand, fit very poorly (with a R2 value of 0.06) into a linear decay relationship. The power-law relationship, if it was at all, had a much lower α value of 0.35, indicating much slower decrease of the count of gene pairs as K increases. Thus, proteins of pairs of antagonizing duplicate genes tend to be grouped together in BLAST hits of the proteins of high duplicability genes. In other words, GC duplicate gene pairs tend to be associated with low duplicability genes and smaller gene families, whereas GA pairs are much more likely to be associated with high duplicability genes and larger gene families.Figure 2

Bottom Line: Quantitatively, the number of genes (P(K)) with K duplicates in a genome decreases precipitously as K increases, and often follows a power law (P(k)∝k-α).Duplicate count (K) of a gene is negatively correlated to the clustering coefficient of its duplicates, suggesting that gene duplicability is related to the extent of sequence divergence within the duplicate gene family.Thus, a positive correlation exists between gene diversifiability and duplicability in the context of biochemical networks - an improvement of our understanding of gene duplicability.

View Article: PubMed Central - PubMed

Affiliation: Greehey Children's Cancer Research Institute, University of Texas Health Science Center at San Antonio, 8403 Floyd Curl Drive, San Antonio, TX 78229-3900, USA. wangd4@uthscsa.edu.

ABSTRACT

Background: Selective gene duplicability, the extensive expansion of a small number of gene families, is universal. Quantitatively, the number of genes (P(K)) with K duplicates in a genome decreases precipitously as K increases, and often follows a power law (P(k)∝k-α). Functional diversification, either neo- or sub-functionalization, is a major evolution route for duplicate genes.

Results: Using three lines of genomic datasets, we studied the relationship between gene duplicability and diversifiability in the topology of biochemical networks. First, we explored scenario where two pathways in the biochemical networks antagonize each other. Synthetic knockout of respective genes for the two pathways rescues the phenotypic defects of each individual knockout. We identified duplicate gene pairs with sufficient divergences that represent this antagonism relationship in the yeast S. cerevisiae. Such pairs overwhelmingly belong to large gene families, thus tend to have high duplicability. Second, we used distances between proteins of duplicate genes in the protein interaction network as a metric of their diversification. The higher a gene's duplicate count, the further the proteins of this gene and its duplicates drift away from one another in the networks, which is especially true for genetically antagonizing duplicate genes. Third, we computed a sequence-homology-based clustering coefficient to quantify sequence diversifiability among duplicate genes - the lower the coefficient, the more the sequences have diverged. Duplicate count (K) of a gene is negatively correlated to the clustering coefficient of its duplicates, suggesting that gene duplicability is related to the extent of sequence divergence within the duplicate gene family.

Conclusion: Thus, a positive correlation exists between gene diversifiability and duplicability in the context of biochemical networks - an improvement of our understanding of gene duplicability.

Show MeSH
Related in: MedlinePlus