Limits...
Deep Neural Networks with Multistate Activation Functions.

Cai C, Xu Y, Ke D, Su K - Comput Intell Neurosci (2015)

Bottom Line: Experimental results on the TIMIT corpus reveal that, on speech recognition tasks, DNNs with MSAFs perform better than the conventional DNNs, getting a relative improvement of 5.60% on phoneme error rates.Further experiments also reveal that mean-normalised SGD facilitates the training processes of DNNs with MSAFs, especially when being with large training sets.The models can also be directly trained without pretraining when the training set is sufficiently large, which results in a considerable relative improvement of 5.82% on word error rates.

View Article: PubMed Central - PubMed

Affiliation: School of Technology, Beijing Forestry University, No. 35 Qinghuadong Road, Haidian District, Beijing 100083, China.

ABSTRACT
We propose multistate activation functions (MSAFs) for deep neural networks (DNNs). These MSAFs are new kinds of activation functions which are capable of representing more than two states, including the N-order MSAFs and the symmetrical MSAF. DNNs with these MSAFs can be trained via conventional Stochastic Gradient Descent (SGD) as well as mean-normalised SGD. We also discuss how these MSAFs perform when used to resolve classification problems. Experimental results on the TIMIT corpus reveal that, on speech recognition tasks, DNNs with MSAFs perform better than the conventional DNNs, getting a relative improvement of 5.60% on phoneme error rates. Further experiments also reveal that mean-normalised SGD facilitates the training processes of DNNs with MSAFs, especially when being with large training sets. The models can also be directly trained without pretraining when the training set is sufficiently large, which results in a considerable relative improvement of 5.82% on word error rates.

No MeSH data available.


Restricting running averages to somewhere near 0. The hollow points stand for running averages of the conventional SGD, while the solid points stand for running averages of mean-normalised SGD.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4581500&req=5

fig4: Restricting running averages to somewhere near 0. The hollow points stand for running averages of the conventional SGD, while the solid points stand for running averages of mean-normalised SGD.

Mentions: Figure 4 shows the effectiveness of mean-normalisation on the symmetrical MSAF. If the running averages are not mean-normalised, they will distribute in a relatively large district. On the other hand, if mean-normalised SGD is applied, the running averages will be restricted to approximately 0. It is worthy to note that 0 means “unknown” states for the symmetrical MSAF. As mentioned before, this state means that a unit temporarily hides itself. Therefore, mean-normalisation allows more units to have the “unknown” states and consequently prevents the model from being impacted by disturbances. As for the other MSAFs which are not symmetrical, mean-normalisation is also useful, as it provides relatively stable basic points. To be concrete, a relatively stable point means where a small disturbance does not bring about an evident influence, whereas a relatively unstable point means where a small disturbance will cause an evident change to the output. The points near 0 are relatively stable as the gradients of these points are relatively small.


Deep Neural Networks with Multistate Activation Functions.

Cai C, Xu Y, Ke D, Su K - Comput Intell Neurosci (2015)

Restricting running averages to somewhere near 0. The hollow points stand for running averages of the conventional SGD, while the solid points stand for running averages of mean-normalised SGD.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4581500&req=5

fig4: Restricting running averages to somewhere near 0. The hollow points stand for running averages of the conventional SGD, while the solid points stand for running averages of mean-normalised SGD.
Mentions: Figure 4 shows the effectiveness of mean-normalisation on the symmetrical MSAF. If the running averages are not mean-normalised, they will distribute in a relatively large district. On the other hand, if mean-normalised SGD is applied, the running averages will be restricted to approximately 0. It is worthy to note that 0 means “unknown” states for the symmetrical MSAF. As mentioned before, this state means that a unit temporarily hides itself. Therefore, mean-normalisation allows more units to have the “unknown” states and consequently prevents the model from being impacted by disturbances. As for the other MSAFs which are not symmetrical, mean-normalisation is also useful, as it provides relatively stable basic points. To be concrete, a relatively stable point means where a small disturbance does not bring about an evident influence, whereas a relatively unstable point means where a small disturbance will cause an evident change to the output. The points near 0 are relatively stable as the gradients of these points are relatively small.

Bottom Line: Experimental results on the TIMIT corpus reveal that, on speech recognition tasks, DNNs with MSAFs perform better than the conventional DNNs, getting a relative improvement of 5.60% on phoneme error rates.Further experiments also reveal that mean-normalised SGD facilitates the training processes of DNNs with MSAFs, especially when being with large training sets.The models can also be directly trained without pretraining when the training set is sufficiently large, which results in a considerable relative improvement of 5.82% on word error rates.

View Article: PubMed Central - PubMed

Affiliation: School of Technology, Beijing Forestry University, No. 35 Qinghuadong Road, Haidian District, Beijing 100083, China.

ABSTRACT
We propose multistate activation functions (MSAFs) for deep neural networks (DNNs). These MSAFs are new kinds of activation functions which are capable of representing more than two states, including the N-order MSAFs and the symmetrical MSAF. DNNs with these MSAFs can be trained via conventional Stochastic Gradient Descent (SGD) as well as mean-normalised SGD. We also discuss how these MSAFs perform when used to resolve classification problems. Experimental results on the TIMIT corpus reveal that, on speech recognition tasks, DNNs with MSAFs perform better than the conventional DNNs, getting a relative improvement of 5.60% on phoneme error rates. Further experiments also reveal that mean-normalised SGD facilitates the training processes of DNNs with MSAFs, especially when being with large training sets. The models can also be directly trained without pretraining when the training set is sufficiently large, which results in a considerable relative improvement of 5.82% on word error rates.

No MeSH data available.