Limits...
Multivariate analysis of flow cytometric data using decision trees.

Simon S, Guthke R, Kamradt T, Frey O - Front Microbiol (2012)

Bottom Line: For research on the host site, flow cytometry has become one of the major tools in immunology.After weighting the data according to their class probabilities, we created a total of 13,392 different decision trees for each given cytokine with different parameter settings.While some of the decision trees reflected previously known co-expression patterns, we found that the expression of some cytokines was not only dependent on the co-expression of others per se, but was also dependent on the intensity of expression.

View Article: PubMed Central - PubMed

Affiliation: Research Group Systems Biology/Bioinformatics, Leibniz Institute for Natural Product Research and Infection Biology - Hans Knöll Institute Jena, Germany.

ABSTRACT
Characterization of the response of the host immune system is important in understanding the bidirectional interactions between the host and microbial pathogens. For research on the host site, flow cytometry has become one of the major tools in immunology. Advances in technology and reagents allow now the simultaneous assessment of multiple markers on a single cell level generating multidimensional data sets that require multivariate statistical analysis. We explored the explanatory power of the supervised machine learning method called "induction of decision trees" in flow cytometric data. In order to examine whether the production of a certain cytokine is depended on other cytokines, datasets from intracellular staining for six cytokines with complex patterns of co-expression were analyzed by induction of decision trees. After weighting the data according to their class probabilities, we created a total of 13,392 different decision trees for each given cytokine with different parameter settings. For a more realistic estimation of the decision trees' quality, we used stratified fivefold cross validation and chose the "best" tree according to a combination of different quality criteria. While some of the decision trees reflected previously known co-expression patterns, we found that the expression of some cytokines was not only dependent on the co-expression of others per se, but was also dependent on the intensity of expression. Thus, for the first time we successfully used induction of decision trees for the analysis of high dimensional flow cytometric data and demonstrated the feasibility of this method to reveal structural patterns in such data sets.

No MeSH data available.


Related in: MedlinePlus

Best decision tree for the classification of cells as positive or negative for IFN-γ expression. Cells are classified based on the MFI values of first TNF-α and second GM-CSF. The blue colored TNF-α (red colored GM-CSF) cut-off value indicates that the split value is high above (below) the cut-off value. Therefore, these nodes dived the cells not just in TNF-α (respectively GM-CSF) negative and positive cells. Only a proportion of the cells which express TNF-α are routed to the right leaf. Contained in this leafs are only 37.43% of all TNF-α positive cells due to this high split value compared to the cut-off value. This leaf captures 62.8% of the IFN-γ positive cells, these are 233 cells (TP – true positive). The leaf also contains the information that this leave captures 81.47% of all IFN-γ and TNF-α positive cells. The leaf in the middle classifies cells as IFN-γ positive and captures 10.78% of all IFN-γ positive cells. The left node classifies cells as IFN-γ negative, thus this leaf wrongly classifies 26.42% of the IFN-γ positive, but captures 92.11% of all IFN-γ negative cells.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3316995&req=5

Figure 6: Best decision tree for the classification of cells as positive or negative for IFN-γ expression. Cells are classified based on the MFI values of first TNF-α and second GM-CSF. The blue colored TNF-α (red colored GM-CSF) cut-off value indicates that the split value is high above (below) the cut-off value. Therefore, these nodes dived the cells not just in TNF-α (respectively GM-CSF) negative and positive cells. Only a proportion of the cells which express TNF-α are routed to the right leaf. Contained in this leafs are only 37.43% of all TNF-α positive cells due to this high split value compared to the cut-off value. This leaf captures 62.8% of the IFN-γ positive cells, these are 233 cells (TP – true positive). The leaf also contains the information that this leave captures 81.47% of all IFN-γ and TNF-α positive cells. The leaf in the middle classifies cells as IFN-γ positive and captures 10.78% of all IFN-γ positive cells. The left node classifies cells as IFN-γ negative, thus this leaf wrongly classifies 26.42% of the IFN-γ positive, but captures 92.11% of all IFN-γ negative cells.

Mentions: All resulting decision trees (besides the tree of IL-17) were of sufficient quality to reveal meaningful structural patterns. This implies that there are associations between the expressions of different cytokines. An interesting common finding for the decision trees for TNF-α (Figure 5) and IL-2 (Figure 4) was the fact that the chosen split thresholds of all used cytokines (RANKL, IL-2 respectively TNF-α, RANKL) were close to the experimentally determined cut-off value of these cytokines. These finding suggests that the expression (or non-expression) of TNF-α and IL-2 depends on if the other cytokines are expressed or not. Interestingly, there was an inverse relationship between the cytokines: no expression of RANKL and IL-2 classified cells as positive for TNF-α (Figure 5). Similarly, no expression of TNF-α and RANKL classified cells as positive for IL-2 (Figure 4). One obvious reason for this classification is that TNF-α and IL-2-expressing cells have a high proportion of cells producing only a single cytokine (see Figure 3B). While the IL-2-expressing Th cells contain 44.1% single producers (Figure 3B) by bivariate analysis, our multidimensional analysis classified 46.31% of all IL-2 positive cells into the left leave of the decision tree (Figure 4). These cells do neither produce TNF-α nor RANKL and can therefore be considered as IL-2 single producers. We therefore can conclude that only the IL-2 single producers are classified correctly. However the decision tree can not reveal patterns in the IL-2 positive cells which are co-expressed with other cytokines. The TNF-α tree (Figure 5) has a similar structure as the IL-2 tree (Figure 4). Cells are classified as TNF-α positive if they neither produce RANKL nor IL-2. Unlike in the IL-2 tree, the TNF-α positive leave does not contain only TNF-α single producers (74.87% TNF-α positive cells, Figure 5 vs. 58.65% TNF-α single producers in Figure 3B). We therefore conclude from the two trees for cytokine expression with a high percentage of single producers that the decision trees could reveal this pattern. Furthermore, other subsets with a high percentage of single producers were used to filter out cells negative for the cytokine of interest. Therefore, the decision trees detect nearly exactly the experimentally determined cut-off values of these cytokines. RANKL (tree not shown) also had a high percentage of single producers. We thus expected a tree with the same structure like for TNF-α and IL-2. Compared to these easy and compact trees, the RANKL decision tree was quite complex, however it could be pruned to the same structure like the IL-2 and TNF-α tree (not shown). This pruning only slightly impaired the classification and resulted in a tree with TNF-α as root and IL-2 as next split attribute. As for RANKL and TNF-α the split values were very close to the experimentally determined cut-off values. Cells were classified as RANKL positive if TNF-α and IL-2 were not expressed and classified as RANKL negative if one of them was expressed. Other decision trees (Figures 6 and 8) had split values highly above the experimentally determined cut-off values. These high split values also revealed some biologically relevant information. As an example, the tree for IFN-ma (Figure 6) was splitted into IFN-γ positive and negative cells by the expression of TNF-α with an MFI of about 6621. Due to this high split value, the node to the right (MFI for TNF-α > 6621) only contained 37.43% of all TNF-α positive cells. However, this node contained 81.47% of all TNF-α and IFN-γ positive cells. Given that the expression of TNF-α started above an MFI of 2368 (as measured by controls), it can be concluded that especially a high expression of TNF-α is associated with the expression of IFN-γ. Routing down the tree of IFN-γ further, the next node contained GM-CSF expression as split attribute for cells with a TNF-α expression below 6621 (Figure 6). However, the split value of 863 for GM-CSF expression was far below the threshold for GM-CSF positive cells as estimated by biological controls (MFI > 1192). This lead to the classification IFN-γ negative for cells below this threshold (TN rate is 92.11%, FN 26.42%) and a classification as IFN-γ positive for cells above this threshold (TP rate 10.78%). Since the TP rate was only around 11% and the split value did not correspond with the true cut-off of GM-CSF, it can be concluded that IFN-γ expression is probably only loosely associated with the expression of GM-CSF. Most of the IFN-γ negative cells do not express GM-CSF since the true negative (TN) rate is high.


Multivariate analysis of flow cytometric data using decision trees.

Simon S, Guthke R, Kamradt T, Frey O - Front Microbiol (2012)

Best decision tree for the classification of cells as positive or negative for IFN-γ expression. Cells are classified based on the MFI values of first TNF-α and second GM-CSF. The blue colored TNF-α (red colored GM-CSF) cut-off value indicates that the split value is high above (below) the cut-off value. Therefore, these nodes dived the cells not just in TNF-α (respectively GM-CSF) negative and positive cells. Only a proportion of the cells which express TNF-α are routed to the right leaf. Contained in this leafs are only 37.43% of all TNF-α positive cells due to this high split value compared to the cut-off value. This leaf captures 62.8% of the IFN-γ positive cells, these are 233 cells (TP – true positive). The leaf also contains the information that this leave captures 81.47% of all IFN-γ and TNF-α positive cells. The leaf in the middle classifies cells as IFN-γ positive and captures 10.78% of all IFN-γ positive cells. The left node classifies cells as IFN-γ negative, thus this leaf wrongly classifies 26.42% of the IFN-γ positive, but captures 92.11% of all IFN-γ negative cells.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3316995&req=5

Figure 6: Best decision tree for the classification of cells as positive or negative for IFN-γ expression. Cells are classified based on the MFI values of first TNF-α and second GM-CSF. The blue colored TNF-α (red colored GM-CSF) cut-off value indicates that the split value is high above (below) the cut-off value. Therefore, these nodes dived the cells not just in TNF-α (respectively GM-CSF) negative and positive cells. Only a proportion of the cells which express TNF-α are routed to the right leaf. Contained in this leafs are only 37.43% of all TNF-α positive cells due to this high split value compared to the cut-off value. This leaf captures 62.8% of the IFN-γ positive cells, these are 233 cells (TP – true positive). The leaf also contains the information that this leave captures 81.47% of all IFN-γ and TNF-α positive cells. The leaf in the middle classifies cells as IFN-γ positive and captures 10.78% of all IFN-γ positive cells. The left node classifies cells as IFN-γ negative, thus this leaf wrongly classifies 26.42% of the IFN-γ positive, but captures 92.11% of all IFN-γ negative cells.
Mentions: All resulting decision trees (besides the tree of IL-17) were of sufficient quality to reveal meaningful structural patterns. This implies that there are associations between the expressions of different cytokines. An interesting common finding for the decision trees for TNF-α (Figure 5) and IL-2 (Figure 4) was the fact that the chosen split thresholds of all used cytokines (RANKL, IL-2 respectively TNF-α, RANKL) were close to the experimentally determined cut-off value of these cytokines. These finding suggests that the expression (or non-expression) of TNF-α and IL-2 depends on if the other cytokines are expressed or not. Interestingly, there was an inverse relationship between the cytokines: no expression of RANKL and IL-2 classified cells as positive for TNF-α (Figure 5). Similarly, no expression of TNF-α and RANKL classified cells as positive for IL-2 (Figure 4). One obvious reason for this classification is that TNF-α and IL-2-expressing cells have a high proportion of cells producing only a single cytokine (see Figure 3B). While the IL-2-expressing Th cells contain 44.1% single producers (Figure 3B) by bivariate analysis, our multidimensional analysis classified 46.31% of all IL-2 positive cells into the left leave of the decision tree (Figure 4). These cells do neither produce TNF-α nor RANKL and can therefore be considered as IL-2 single producers. We therefore can conclude that only the IL-2 single producers are classified correctly. However the decision tree can not reveal patterns in the IL-2 positive cells which are co-expressed with other cytokines. The TNF-α tree (Figure 5) has a similar structure as the IL-2 tree (Figure 4). Cells are classified as TNF-α positive if they neither produce RANKL nor IL-2. Unlike in the IL-2 tree, the TNF-α positive leave does not contain only TNF-α single producers (74.87% TNF-α positive cells, Figure 5 vs. 58.65% TNF-α single producers in Figure 3B). We therefore conclude from the two trees for cytokine expression with a high percentage of single producers that the decision trees could reveal this pattern. Furthermore, other subsets with a high percentage of single producers were used to filter out cells negative for the cytokine of interest. Therefore, the decision trees detect nearly exactly the experimentally determined cut-off values of these cytokines. RANKL (tree not shown) also had a high percentage of single producers. We thus expected a tree with the same structure like for TNF-α and IL-2. Compared to these easy and compact trees, the RANKL decision tree was quite complex, however it could be pruned to the same structure like the IL-2 and TNF-α tree (not shown). This pruning only slightly impaired the classification and resulted in a tree with TNF-α as root and IL-2 as next split attribute. As for RANKL and TNF-α the split values were very close to the experimentally determined cut-off values. Cells were classified as RANKL positive if TNF-α and IL-2 were not expressed and classified as RANKL negative if one of them was expressed. Other decision trees (Figures 6 and 8) had split values highly above the experimentally determined cut-off values. These high split values also revealed some biologically relevant information. As an example, the tree for IFN-ma (Figure 6) was splitted into IFN-γ positive and negative cells by the expression of TNF-α with an MFI of about 6621. Due to this high split value, the node to the right (MFI for TNF-α > 6621) only contained 37.43% of all TNF-α positive cells. However, this node contained 81.47% of all TNF-α and IFN-γ positive cells. Given that the expression of TNF-α started above an MFI of 2368 (as measured by controls), it can be concluded that especially a high expression of TNF-α is associated with the expression of IFN-γ. Routing down the tree of IFN-γ further, the next node contained GM-CSF expression as split attribute for cells with a TNF-α expression below 6621 (Figure 6). However, the split value of 863 for GM-CSF expression was far below the threshold for GM-CSF positive cells as estimated by biological controls (MFI > 1192). This lead to the classification IFN-γ negative for cells below this threshold (TN rate is 92.11%, FN 26.42%) and a classification as IFN-γ positive for cells above this threshold (TP rate 10.78%). Since the TP rate was only around 11% and the split value did not correspond with the true cut-off of GM-CSF, it can be concluded that IFN-γ expression is probably only loosely associated with the expression of GM-CSF. Most of the IFN-γ negative cells do not express GM-CSF since the true negative (TN) rate is high.

Bottom Line: For research on the host site, flow cytometry has become one of the major tools in immunology.After weighting the data according to their class probabilities, we created a total of 13,392 different decision trees for each given cytokine with different parameter settings.While some of the decision trees reflected previously known co-expression patterns, we found that the expression of some cytokines was not only dependent on the co-expression of others per se, but was also dependent on the intensity of expression.

View Article: PubMed Central - PubMed

Affiliation: Research Group Systems Biology/Bioinformatics, Leibniz Institute for Natural Product Research and Infection Biology - Hans Knöll Institute Jena, Germany.

ABSTRACT
Characterization of the response of the host immune system is important in understanding the bidirectional interactions between the host and microbial pathogens. For research on the host site, flow cytometry has become one of the major tools in immunology. Advances in technology and reagents allow now the simultaneous assessment of multiple markers on a single cell level generating multidimensional data sets that require multivariate statistical analysis. We explored the explanatory power of the supervised machine learning method called "induction of decision trees" in flow cytometric data. In order to examine whether the production of a certain cytokine is depended on other cytokines, datasets from intracellular staining for six cytokines with complex patterns of co-expression were analyzed by induction of decision trees. After weighting the data according to their class probabilities, we created a total of 13,392 different decision trees for each given cytokine with different parameter settings. For a more realistic estimation of the decision trees' quality, we used stratified fivefold cross validation and chose the "best" tree according to a combination of different quality criteria. While some of the decision trees reflected previously known co-expression patterns, we found that the expression of some cytokines was not only dependent on the co-expression of others per se, but was also dependent on the intensity of expression. Thus, for the first time we successfully used induction of decision trees for the analysis of high dimensional flow cytometric data and demonstrated the feasibility of this method to reveal structural patterns in such data sets.

No MeSH data available.


Related in: MedlinePlus