Limits...
Path Models of Vocal Emotion Communication.

Bänziger T, Hosoya G, Scherer KR - PLoS ONE (2015)

Bottom Line: The utility of the approach is demonstrated for two data sets from two different cultures and languages, based on corpora of vocal emotion enactment by professional actors and emotion inference by naïve listeners.The statistical models generated show that more sophisticated acoustic parameters need to be developed to explain the distal underpinnings of subjective voice quality percepts that account for much of the variance in emotion inference, in particular voice instability and roughness.The general approach advocated here, as well as the specific results, open up new research strategies for work in psychology (specifically emotion and social perception research) and engineering and computer science (specifically research and development in the domain of affective computing, particularly on automatic emotion detection and synthetic emotion expression in avatars).

View Article: PubMed Central - PubMed

Affiliation: Department of Psychology, Mid Sweden University, Östersund, Sweden.

ABSTRACT
We propose to use a comprehensive path model of vocal emotion communication, encompassing encoding, transmission, and decoding processes, to empirically model data sets on emotion expression and recognition. The utility of the approach is demonstrated for two data sets from two different cultures and languages, based on corpora of vocal emotion enactment by professional actors and emotion inference by naïve listeners. Lens model equations, hierarchical regression, and multivariate path analysis are used to compare the relative contributions of objectively measured acoustic cues in the enacted expressions and subjective voice cues as perceived by listeners to the variance in emotion inference from vocal expressions for four emotion families (fear, anger, happiness, and sadness). While the results confirm the central role of arousal in vocal emotion communication, the utility of applying an extended path modeling framework is demonstrated by the identification of unique combinations of distal cues and proximal percepts carrying information about specific emotion families, independent of arousal. The statistical models generated show that more sophisticated acoustic parameters need to be developed to explain the distal underpinnings of subjective voice quality percepts that account for much of the variance in emotion inference, in particular voice instability and roughness. The general approach advocated here, as well as the specific results, open up new research strategies for work in psychology (specifically emotion and social perception research) and engineering and computer science (specifically research and development in the domain of affective computing, particularly on automatic emotion detection and synthetic emotion expression in avatars).

No MeSH data available.


Related in: MedlinePlus

Standardized path coefficients of the estimated model for anger (data merged for MUC and GVA).Only significant path coefficients are shown (p < .02). Significant paths with an absolute value > .2 are depicted in black, significant paths with an absolute value < .2 are depicted in black. int. = intensity; r. energy = relative energy.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4556609&req=5

pone.0136675.g005: Standardized path coefficients of the estimated model for anger (data merged for MUC and GVA).Only significant path coefficients are shown (p < .02). Significant paths with an absolute value > .2 are depicted in black, significant paths with an absolute value < .2 are depicted in black. int. = intensity; r. energy = relative energy.

Mentions: Figs 5 and 6 show the specific models for anger and arousal. From the graph for anger, it is evident that the most dominant path chain from expressed anger to perceived anger runs from high acoustic intensity to high perceived loudness and from there to the inference of perceived anger. However, the direct path from expressed anger to perceived anger is relatively strong, indicating that the acoustic measures and the proximal percepts do not carry all the information that is used to infer the emotion. Fig 6 for arousal shows that high arousal is reflected in specific changes in almost all acoustic measures except for relative energy and duration. On the proximal side of the model, it is mostly loudness that is used to infer perceived arousal. Figs 7 and 8 show the results for fear and sadness. Fig 7 shows that fear portrayals differ from happiness portrayals by a lower F0 range, a higher F0 floor and lower duration. Expressed fear is negatively associated with duration, suggesting higher tempo. Correspondingly, duration is negatively associated with perceived speech rate and positively with perceived instability. Finally, high perceived instability and high perceived speech rate are associated with perceived fear. Fig 8 shows that the acoustic measures included in the model are only weakly associated with expressed sadness. The strongest paths between the acoustic measures and proximal percepts run from mean intensity to intonation and loudness. Perceived sadness is negatively associated with intonation modulation and speech rate, and positively with perceived instability.


Path Models of Vocal Emotion Communication.

Bänziger T, Hosoya G, Scherer KR - PLoS ONE (2015)

Standardized path coefficients of the estimated model for anger (data merged for MUC and GVA).Only significant path coefficients are shown (p < .02). Significant paths with an absolute value > .2 are depicted in black, significant paths with an absolute value < .2 are depicted in black. int. = intensity; r. energy = relative energy.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4556609&req=5

pone.0136675.g005: Standardized path coefficients of the estimated model for anger (data merged for MUC and GVA).Only significant path coefficients are shown (p < .02). Significant paths with an absolute value > .2 are depicted in black, significant paths with an absolute value < .2 are depicted in black. int. = intensity; r. energy = relative energy.
Mentions: Figs 5 and 6 show the specific models for anger and arousal. From the graph for anger, it is evident that the most dominant path chain from expressed anger to perceived anger runs from high acoustic intensity to high perceived loudness and from there to the inference of perceived anger. However, the direct path from expressed anger to perceived anger is relatively strong, indicating that the acoustic measures and the proximal percepts do not carry all the information that is used to infer the emotion. Fig 6 for arousal shows that high arousal is reflected in specific changes in almost all acoustic measures except for relative energy and duration. On the proximal side of the model, it is mostly loudness that is used to infer perceived arousal. Figs 7 and 8 show the results for fear and sadness. Fig 7 shows that fear portrayals differ from happiness portrayals by a lower F0 range, a higher F0 floor and lower duration. Expressed fear is negatively associated with duration, suggesting higher tempo. Correspondingly, duration is negatively associated with perceived speech rate and positively with perceived instability. Finally, high perceived instability and high perceived speech rate are associated with perceived fear. Fig 8 shows that the acoustic measures included in the model are only weakly associated with expressed sadness. The strongest paths between the acoustic measures and proximal percepts run from mean intensity to intonation and loudness. Perceived sadness is negatively associated with intonation modulation and speech rate, and positively with perceived instability.

Bottom Line: The utility of the approach is demonstrated for two data sets from two different cultures and languages, based on corpora of vocal emotion enactment by professional actors and emotion inference by naïve listeners.The statistical models generated show that more sophisticated acoustic parameters need to be developed to explain the distal underpinnings of subjective voice quality percepts that account for much of the variance in emotion inference, in particular voice instability and roughness.The general approach advocated here, as well as the specific results, open up new research strategies for work in psychology (specifically emotion and social perception research) and engineering and computer science (specifically research and development in the domain of affective computing, particularly on automatic emotion detection and synthetic emotion expression in avatars).

View Article: PubMed Central - PubMed

Affiliation: Department of Psychology, Mid Sweden University, Östersund, Sweden.

ABSTRACT
We propose to use a comprehensive path model of vocal emotion communication, encompassing encoding, transmission, and decoding processes, to empirically model data sets on emotion expression and recognition. The utility of the approach is demonstrated for two data sets from two different cultures and languages, based on corpora of vocal emotion enactment by professional actors and emotion inference by naïve listeners. Lens model equations, hierarchical regression, and multivariate path analysis are used to compare the relative contributions of objectively measured acoustic cues in the enacted expressions and subjective voice cues as perceived by listeners to the variance in emotion inference from vocal expressions for four emotion families (fear, anger, happiness, and sadness). While the results confirm the central role of arousal in vocal emotion communication, the utility of applying an extended path modeling framework is demonstrated by the identification of unique combinations of distal cues and proximal percepts carrying information about specific emotion families, independent of arousal. The statistical models generated show that more sophisticated acoustic parameters need to be developed to explain the distal underpinnings of subjective voice quality percepts that account for much of the variance in emotion inference, in particular voice instability and roughness. The general approach advocated here, as well as the specific results, open up new research strategies for work in psychology (specifically emotion and social perception research) and engineering and computer science (specifically research and development in the domain of affective computing, particularly on automatic emotion detection and synthetic emotion expression in avatars).

No MeSH data available.


Related in: MedlinePlus