Limits...
Real-Time Control of an Articulatory-Based Speech Synthesizer for Brain Computer Interfaces

View Article: PubMed Central - PubMed

ABSTRACT

Restoring natural speech in paralyzed and aphasic people could be achieved using a Brain-Computer Interface (BCI) controlling a speech synthesizer in real-time. To reach this goal, a prerequisite is to develop a speech synthesizer producing intelligible speech in real-time with a reasonable number of control parameters. We present here an articulatory-based speech synthesizer that can be controlled in real-time for future BCI applications. This synthesizer converts movements of the main speech articulators (tongue, jaw, velum, and lips) into intelligible speech. The articulatory-to-acoustic mapping is performed using a deep neural network (DNN) trained on electromagnetic articulography (EMA) data recorded on a reference speaker synchronously with the produced speech signal. This DNN is then used in both offline and online modes to map the position of sensors glued on different speech articulators into acoustic parameters that are further converted into an audio signal using a vocoder. In offline mode, highly intelligible speech could be obtained as assessed by perceptual evaluation performed by 12 listeners. Then, to anticipate future BCI applications, we further assessed the real-time control of the synthesizer by both the reference speaker and new speakers, in a closed-loop paradigm using EMA data recorded in real time. A short calibration period was used to compensate for differences in sensor positions and articulatory differences between new speakers and the reference speaker. We found that real-time synthesis of vowels and consonants was possible with good intelligibility. In conclusion, these results open to future speech BCI applications using such articulatory-based speech synthesizer.

No MeSH data available.


Offline reference synthesis example.Comparison of the spectrograms of the original audio, and the corresponding audio signal produced by the 5 different offline articulatory synthesis for the sentence “Le fermier est parti pour la foire” (“The farmer went to the fair”). Dashed lines show the phonetic segmentation obtained by forced-alignment.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC5120792&req=5

pcbi.1005119.g005: Offline reference synthesis example.Comparison of the spectrograms of the original audio, and the corresponding audio signal produced by the 5 different offline articulatory synthesis for the sentence “Le fermier est parti pour la foire” (“The farmer went to the fair”). Dashed lines show the phonetic segmentation obtained by forced-alignment.

Mentions: First, we evaluated the proposed DNN-based articulatory synthesizer described in the Methods section. Fig 5 shows the spectrogram of the original sound for an example sentence (the sentence was “Le fermier est parti pour la foire”, meaning “The farmer went to the fair”), together with the 5 different synthesis. Note that there is speech present in the synthesized sample before the actual beginning of the reference sentence, since no assumption can be made on the presence of the air flow when considering only articulatory data. The corresponding synthesized sounds are provided in S1–S6 Audio Files, further illustrating the good intelligibility of the synthesized sounds when using at least 10 articulatory parameters. Note however that, in the following, the quality of the articulatory-to-acoustic mapping was evaluated subjectively by naive listeners mainly on isolated vowels and VCVs in order to avoid the influence of the linguistic context that tends to over-estimate evaluation results.


Real-Time Control of an Articulatory-Based Speech Synthesizer for Brain Computer Interfaces
Offline reference synthesis example.Comparison of the spectrograms of the original audio, and the corresponding audio signal produced by the 5 different offline articulatory synthesis for the sentence “Le fermier est parti pour la foire” (“The farmer went to the fair”). Dashed lines show the phonetic segmentation obtained by forced-alignment.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC5120792&req=5

pcbi.1005119.g005: Offline reference synthesis example.Comparison of the spectrograms of the original audio, and the corresponding audio signal produced by the 5 different offline articulatory synthesis for the sentence “Le fermier est parti pour la foire” (“The farmer went to the fair”). Dashed lines show the phonetic segmentation obtained by forced-alignment.
Mentions: First, we evaluated the proposed DNN-based articulatory synthesizer described in the Methods section. Fig 5 shows the spectrogram of the original sound for an example sentence (the sentence was “Le fermier est parti pour la foire”, meaning “The farmer went to the fair”), together with the 5 different synthesis. Note that there is speech present in the synthesized sample before the actual beginning of the reference sentence, since no assumption can be made on the presence of the air flow when considering only articulatory data. The corresponding synthesized sounds are provided in S1–S6 Audio Files, further illustrating the good intelligibility of the synthesized sounds when using at least 10 articulatory parameters. Note however that, in the following, the quality of the articulatory-to-acoustic mapping was evaluated subjectively by naive listeners mainly on isolated vowels and VCVs in order to avoid the influence of the linguistic context that tends to over-estimate evaluation results.

View Article: PubMed Central - PubMed

ABSTRACT

Restoring natural speech in paralyzed and aphasic people could be achieved using a Brain-Computer Interface (BCI) controlling a speech synthesizer in real-time. To reach this goal, a prerequisite is to develop a speech synthesizer producing intelligible speech in real-time with a reasonable number of control parameters. We present here an articulatory-based speech synthesizer that can be controlled in real-time for future BCI applications. This synthesizer converts movements of the main speech articulators (tongue, jaw, velum, and lips) into intelligible speech. The articulatory-to-acoustic mapping is performed using a deep neural network (DNN) trained on electromagnetic articulography (EMA) data recorded on a reference speaker synchronously with the produced speech signal. This DNN is then used in both offline and online modes to map the position of sensors glued on different speech articulators into acoustic parameters that are further converted into an audio signal using a vocoder. In offline mode, highly intelligible speech could be obtained as assessed by perceptual evaluation performed by 12 listeners. Then, to anticipate future BCI applications, we further assessed the real-time control of the synthesizer by both the reference speaker and new speakers, in a closed-loop paradigm using EMA data recorded in real time. A short calibration period was used to compensate for differences in sensor positions and articulatory differences between new speakers and the reference speaker. We found that real-time synthesis of vowels and consonants was possible with good intelligibility. In conclusion, these results open to future speech BCI applications using such articulatory-based speech synthesizer.

No MeSH data available.