Limits...
Real-Time Control of an Articulatory-Based Speech Synthesizer for Brain Computer Interfaces

View Article: PubMed Central - PubMed

ABSTRACT

Restoring natural speech in paralyzed and aphasic people could be achieved using a Brain-Computer Interface (BCI) controlling a speech synthesizer in real-time. To reach this goal, a prerequisite is to develop a speech synthesizer producing intelligible speech in real-time with a reasonable number of control parameters. We present here an articulatory-based speech synthesizer that can be controlled in real-time for future BCI applications. This synthesizer converts movements of the main speech articulators (tongue, jaw, velum, and lips) into intelligible speech. The articulatory-to-acoustic mapping is performed using a deep neural network (DNN) trained on electromagnetic articulography (EMA) data recorded on a reference speaker synchronously with the produced speech signal. This DNN is then used in both offline and online modes to map the position of sensors glued on different speech articulators into acoustic parameters that are further converted into an audio signal using a vocoder. In offline mode, highly intelligible speech could be obtained as assessed by perceptual evaluation performed by 12 listeners. Then, to anticipate future BCI applications, we further assessed the real-time control of the synthesizer by both the reference speaker and new speakers, in a closed-loop paradigm using EMA data recorded in real time. A short calibration period was used to compensate for differences in sensor positions and articulatory differences between new speakers and the reference speaker. We found that real-time synthesis of vowels and consonants was possible with good intelligibility. In conclusion, these results open to future speech BCI applications using such articulatory-based speech synthesizer.

No MeSH data available.


Articulatory-based speech synthesizer.Using a DNN, articulatory features of the reference speaker are mapped to acoustic features, which are then converted into an audible signal using the MLSA filter and an excitation signal.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC5120792&req=5

pcbi.1005119.g001: Articulatory-based speech synthesizer.Using a DNN, articulatory features of the reference speaker are mapped to acoustic features, which are then converted into an audible signal using the MLSA filter and an excitation signal.

Mentions: In a first step, we designed an intelligible articulatory-based speech synthesizer converting the trajectories of the main speech articulators (tongue, lips, jaw, and velum) into speech (see Fig 1). For this purpose, we first built a large articulatory-acoustic database, in which articulatory data from a native French male speaker was recorded synchronously with the produced audio speech signal. Then computational models based on DNNs were trained on these data to transform articulatory signals into acoustic speech signals (i.e. articulatory-to-acoustic mapping). When considering articulatory synthesis using physical or geometrical models, the articulatory data obtained by EMA can be mapped to the geometrical parameters of the model [39]. Here we consider a machine-learning approach in which the articulatory data obtained by EMA is directly mapped to the acoustic parameters of a vocoder.


Real-Time Control of an Articulatory-Based Speech Synthesizer for Brain Computer Interfaces
Articulatory-based speech synthesizer.Using a DNN, articulatory features of the reference speaker are mapped to acoustic features, which are then converted into an audible signal using the MLSA filter and an excitation signal.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC5120792&req=5

pcbi.1005119.g001: Articulatory-based speech synthesizer.Using a DNN, articulatory features of the reference speaker are mapped to acoustic features, which are then converted into an audible signal using the MLSA filter and an excitation signal.
Mentions: In a first step, we designed an intelligible articulatory-based speech synthesizer converting the trajectories of the main speech articulators (tongue, lips, jaw, and velum) into speech (see Fig 1). For this purpose, we first built a large articulatory-acoustic database, in which articulatory data from a native French male speaker was recorded synchronously with the produced audio speech signal. Then computational models based on DNNs were trained on these data to transform articulatory signals into acoustic speech signals (i.e. articulatory-to-acoustic mapping). When considering articulatory synthesis using physical or geometrical models, the articulatory data obtained by EMA can be mapped to the geometrical parameters of the model [39]. Here we consider a machine-learning approach in which the articulatory data obtained by EMA is directly mapped to the acoustic parameters of a vocoder.

View Article: PubMed Central - PubMed

ABSTRACT

Restoring natural speech in paralyzed and aphasic people could be achieved using a Brain-Computer Interface (BCI) controlling a speech synthesizer in real-time. To reach this goal, a prerequisite is to develop a speech synthesizer producing intelligible speech in real-time with a reasonable number of control parameters. We present here an articulatory-based speech synthesizer that can be controlled in real-time for future BCI applications. This synthesizer converts movements of the main speech articulators (tongue, jaw, velum, and lips) into intelligible speech. The articulatory-to-acoustic mapping is performed using a deep neural network (DNN) trained on electromagnetic articulography (EMA) data recorded on a reference speaker synchronously with the produced speech signal. This DNN is then used in both offline and online modes to map the position of sensors glued on different speech articulators into acoustic parameters that are further converted into an audio signal using a vocoder. In offline mode, highly intelligible speech could be obtained as assessed by perceptual evaluation performed by 12 listeners. Then, to anticipate future BCI applications, we further assessed the real-time control of the synthesizer by both the reference speaker and new speakers, in a closed-loop paradigm using EMA data recorded in real time. A short calibration period was used to compensate for differences in sensor positions and articulatory differences between new speakers and the reference speaker. We found that real-time synthesis of vowels and consonants was possible with good intelligibility. In conclusion, these results open to future speech BCI applications using such articulatory-based speech synthesizer.

No MeSH data available.