Limits...
Path Models of Vocal Emotion Communication.

Bänziger T, Hosoya G, Scherer KR - PLoS ONE (2015)

Bottom Line: The utility of the approach is demonstrated for two data sets from two different cultures and languages, based on corpora of vocal emotion enactment by professional actors and emotion inference by naïve listeners.The statistical models generated show that more sophisticated acoustic parameters need to be developed to explain the distal underpinnings of subjective voice quality percepts that account for much of the variance in emotion inference, in particular voice instability and roughness.The general approach advocated here, as well as the specific results, open up new research strategies for work in psychology (specifically emotion and social perception research) and engineering and computer science (specifically research and development in the domain of affective computing, particularly on automatic emotion detection and synthetic emotion expression in avatars).

View Article: PubMed Central - PubMed

Affiliation: Department of Psychology, Mid Sweden University, Östersund, Sweden.

ABSTRACT
We propose to use a comprehensive path model of vocal emotion communication, encompassing encoding, transmission, and decoding processes, to empirically model data sets on emotion expression and recognition. The utility of the approach is demonstrated for two data sets from two different cultures and languages, based on corpora of vocal emotion enactment by professional actors and emotion inference by naïve listeners. Lens model equations, hierarchical regression, and multivariate path analysis are used to compare the relative contributions of objectively measured acoustic cues in the enacted expressions and subjective voice cues as perceived by listeners to the variance in emotion inference from vocal expressions for four emotion families (fear, anger, happiness, and sadness). While the results confirm the central role of arousal in vocal emotion communication, the utility of applying an extended path modeling framework is demonstrated by the identification of unique combinations of distal cues and proximal percepts carrying information about specific emotion families, independent of arousal. The statistical models generated show that more sophisticated acoustic parameters need to be developed to explain the distal underpinnings of subjective voice quality percepts that account for much of the variance in emotion inference, in particular voice instability and roughness. The general approach advocated here, as well as the specific results, open up new research strategies for work in psychology (specifically emotion and social perception research) and engineering and computer science (specifically research and development in the domain of affective computing, particularly on automatic emotion detection and synthetic emotion expression in avatars).

No MeSH data available.


Related in: MedlinePlus

The tripartite emotion expression and perception (TEEP) model (based on Brunswik's lens model).The terms “push” and “pull” refer to the internal and the external determinants of the emotional expression, respectively, distinguished in the lower and upper parts of the figure. D = distal cues; P = percepts. Adapted from p. 120 in Scherer [36].
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4556609&req=5

pone.0136675.g001: The tripartite emotion expression and perception (TEEP) model (based on Brunswik's lens model).The terms “push” and “pull” refer to the internal and the external determinants of the emotional expression, respectively, distinguished in the lower and upper parts of the figure. D = distal cues; P = percepts. Adapted from p. 120 in Scherer [36].

Mentions: More recently, Scherer [36] has formalized the earlier suggestion for an extension of the lens model as a tripartite emotion expression and perception (TEEP) model (see Fig 1). The communication process is represented by four elements (emoter/sender, distal cues, proximal percepts and observer) and three phases (externalization driven by external models and internal changes, transmission, cue utilization driven by inference rules and schematic recognition). Applying this model to our specific research questions, the internal state of the speaker (e.g. the emotion process) is encoded via distal vocal cues (measured by acoustic analysis); the listener perceives the vocal utterance and extracts a number of proximal cues (measured by subjective voice quality ratings obtained from naive observers); and, finally, some of these proximal cues are used by the listener to infer the internal state of the speaker based on schematic recognition or explicit inference rules (measured by naive observers asked to recognize the underlying emotion). The first step in this process is called the externalization of the internal emotional state, the second step the transmission of the acoustic information and the forming of a perceptual representation of the physical speech/voice signal, and the third and last step the inferential utilization and the emergence of an emotional attribution.


Path Models of Vocal Emotion Communication.

Bänziger T, Hosoya G, Scherer KR - PLoS ONE (2015)

The tripartite emotion expression and perception (TEEP) model (based on Brunswik's lens model).The terms “push” and “pull” refer to the internal and the external determinants of the emotional expression, respectively, distinguished in the lower and upper parts of the figure. D = distal cues; P = percepts. Adapted from p. 120 in Scherer [36].
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4556609&req=5

pone.0136675.g001: The tripartite emotion expression and perception (TEEP) model (based on Brunswik's lens model).The terms “push” and “pull” refer to the internal and the external determinants of the emotional expression, respectively, distinguished in the lower and upper parts of the figure. D = distal cues; P = percepts. Adapted from p. 120 in Scherer [36].
Mentions: More recently, Scherer [36] has formalized the earlier suggestion for an extension of the lens model as a tripartite emotion expression and perception (TEEP) model (see Fig 1). The communication process is represented by four elements (emoter/sender, distal cues, proximal percepts and observer) and three phases (externalization driven by external models and internal changes, transmission, cue utilization driven by inference rules and schematic recognition). Applying this model to our specific research questions, the internal state of the speaker (e.g. the emotion process) is encoded via distal vocal cues (measured by acoustic analysis); the listener perceives the vocal utterance and extracts a number of proximal cues (measured by subjective voice quality ratings obtained from naive observers); and, finally, some of these proximal cues are used by the listener to infer the internal state of the speaker based on schematic recognition or explicit inference rules (measured by naive observers asked to recognize the underlying emotion). The first step in this process is called the externalization of the internal emotional state, the second step the transmission of the acoustic information and the forming of a perceptual representation of the physical speech/voice signal, and the third and last step the inferential utilization and the emergence of an emotional attribution.

Bottom Line: The utility of the approach is demonstrated for two data sets from two different cultures and languages, based on corpora of vocal emotion enactment by professional actors and emotion inference by naïve listeners.The statistical models generated show that more sophisticated acoustic parameters need to be developed to explain the distal underpinnings of subjective voice quality percepts that account for much of the variance in emotion inference, in particular voice instability and roughness.The general approach advocated here, as well as the specific results, open up new research strategies for work in psychology (specifically emotion and social perception research) and engineering and computer science (specifically research and development in the domain of affective computing, particularly on automatic emotion detection and synthetic emotion expression in avatars).

View Article: PubMed Central - PubMed

Affiliation: Department of Psychology, Mid Sweden University, Östersund, Sweden.

ABSTRACT
We propose to use a comprehensive path model of vocal emotion communication, encompassing encoding, transmission, and decoding processes, to empirically model data sets on emotion expression and recognition. The utility of the approach is demonstrated for two data sets from two different cultures and languages, based on corpora of vocal emotion enactment by professional actors and emotion inference by naïve listeners. Lens model equations, hierarchical regression, and multivariate path analysis are used to compare the relative contributions of objectively measured acoustic cues in the enacted expressions and subjective voice cues as perceived by listeners to the variance in emotion inference from vocal expressions for four emotion families (fear, anger, happiness, and sadness). While the results confirm the central role of arousal in vocal emotion communication, the utility of applying an extended path modeling framework is demonstrated by the identification of unique combinations of distal cues and proximal percepts carrying information about specific emotion families, independent of arousal. The statistical models generated show that more sophisticated acoustic parameters need to be developed to explain the distal underpinnings of subjective voice quality percepts that account for much of the variance in emotion inference, in particular voice instability and roughness. The general approach advocated here, as well as the specific results, open up new research strategies for work in psychology (specifically emotion and social perception research) and engineering and computer science (specifically research and development in the domain of affective computing, particularly on automatic emotion detection and synthetic emotion expression in avatars).

No MeSH data available.


Related in: MedlinePlus