Limits...
Deep Neural Networks with Multistate Activation Functions.

Cai C, Xu Y, Ke D, Su K - Comput Intell Neurosci (2015)

Bottom Line: Experimental results on the TIMIT corpus reveal that, on speech recognition tasks, DNNs with MSAFs perform better than the conventional DNNs, getting a relative improvement of 5.60% on phoneme error rates.Further experiments also reveal that mean-normalised SGD facilitates the training processes of DNNs with MSAFs, especially when being with large training sets.The models can also be directly trained without pretraining when the training set is sufficiently large, which results in a considerable relative improvement of 5.82% on word error rates.

View Article: PubMed Central - PubMed

Affiliation: School of Technology, Beijing Forestry University, No. 35 Qinghuadong Road, Haidian District, Beijing 100083, China.

ABSTRACT
We propose multistate activation functions (MSAFs) for deep neural networks (DNNs). These MSAFs are new kinds of activation functions which are capable of representing more than two states, including the N-order MSAFs and the symmetrical MSAF. DNNs with these MSAFs can be trained via conventional Stochastic Gradient Descent (SGD) as well as mean-normalised SGD. We also discuss how these MSAFs perform when used to resolve classification problems. Experimental results on the TIMIT corpus reveal that, on speech recognition tasks, DNNs with MSAFs perform better than the conventional DNNs, getting a relative improvement of 5.60% on phoneme error rates. Further experiments also reveal that mean-normalised SGD facilitates the training processes of DNNs with MSAFs, especially when being with large training sets. The models can also be directly trained without pretraining when the training set is sufficiently large, which results in a considerable relative improvement of 5.82% on word error rates.

No MeSH data available.


A neural network. In this model, i1 and i2 are input units; h1 and h2 are hidden units; o is an output unit; and wi and bi are weights and bias, respectively.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4581500&req=5

fig6: A neural network. In this model, i1 and i2 are input units; h1 and h2 are hidden units; o is an output unit; and wi and bi are weights and bias, respectively.

Mentions: However, if MSAFs are used, the number of units can be reduced because MSAFs have more than two different states. Figure 6 provides a neural network which will be used to resolve the problem. In this neural network, the hidden units and the output unit use MSAFs as their activation functions. The input units of this neural network correspond to the horizontal and longitudinal coordinates, respectively, and the output unit will provide classifying results. There are a variety of different combinations of weights, bias, and activation functions which are able to classify the patterns. For instance, let (w1, w2, w3, w4, w5, w6, b1, b2, b3) = (−16, −32,48,16,40, −48, −40,40,32) and let the activation functions of the hidden units and the output unit be a 3-order MSAF: y = 1/(1 + e−x) + 1/(1 + e−x+20) + 1/(1 + e−x+40). Then, the output unit will sign the patterns as


Deep Neural Networks with Multistate Activation Functions.

Cai C, Xu Y, Ke D, Su K - Comput Intell Neurosci (2015)

A neural network. In this model, i1 and i2 are input units; h1 and h2 are hidden units; o is an output unit; and wi and bi are weights and bias, respectively.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4581500&req=5

fig6: A neural network. In this model, i1 and i2 are input units; h1 and h2 are hidden units; o is an output unit; and wi and bi are weights and bias, respectively.
Mentions: However, if MSAFs are used, the number of units can be reduced because MSAFs have more than two different states. Figure 6 provides a neural network which will be used to resolve the problem. In this neural network, the hidden units and the output unit use MSAFs as their activation functions. The input units of this neural network correspond to the horizontal and longitudinal coordinates, respectively, and the output unit will provide classifying results. There are a variety of different combinations of weights, bias, and activation functions which are able to classify the patterns. For instance, let (w1, w2, w3, w4, w5, w6, b1, b2, b3) = (−16, −32,48,16,40, −48, −40,40,32) and let the activation functions of the hidden units and the output unit be a 3-order MSAF: y = 1/(1 + e−x) + 1/(1 + e−x+20) + 1/(1 + e−x+40). Then, the output unit will sign the patterns as

Bottom Line: Experimental results on the TIMIT corpus reveal that, on speech recognition tasks, DNNs with MSAFs perform better than the conventional DNNs, getting a relative improvement of 5.60% on phoneme error rates.Further experiments also reveal that mean-normalised SGD facilitates the training processes of DNNs with MSAFs, especially when being with large training sets.The models can also be directly trained without pretraining when the training set is sufficiently large, which results in a considerable relative improvement of 5.82% on word error rates.

View Article: PubMed Central - PubMed

Affiliation: School of Technology, Beijing Forestry University, No. 35 Qinghuadong Road, Haidian District, Beijing 100083, China.

ABSTRACT
We propose multistate activation functions (MSAFs) for deep neural networks (DNNs). These MSAFs are new kinds of activation functions which are capable of representing more than two states, including the N-order MSAFs and the symmetrical MSAF. DNNs with these MSAFs can be trained via conventional Stochastic Gradient Descent (SGD) as well as mean-normalised SGD. We also discuss how these MSAFs perform when used to resolve classification problems. Experimental results on the TIMIT corpus reveal that, on speech recognition tasks, DNNs with MSAFs perform better than the conventional DNNs, getting a relative improvement of 5.60% on phoneme error rates. Further experiments also reveal that mean-normalised SGD facilitates the training processes of DNNs with MSAFs, especially when being with large training sets. The models can also be directly trained without pretraining when the training set is sufficiently large, which results in a considerable relative improvement of 5.82% on word error rates.

No MeSH data available.