Limits...
Information-Driven Active Audio-Visual Source Localization.

Schult N, Reineking T, Kluss T, Zetzsche C - PLoS ONE (2015)

Bottom Line: These actions by the robot successively reduce uncertainty about the source's position.Because of the robot's mobility, this approach is suitable for use in complex and cluttered environments.We present qualitative and quantitative results of the system's performance and discuss possible areas of application.

View Article: PubMed Central - PubMed

Affiliation: Cognitive Neuroinformatics, Bremen University, Bremen, Germany.

ABSTRACT
We present a system for sensorimotor audio-visual source localization on a mobile robot. We utilize a particle filter for the combination of audio-visual information and for the temporal integration of consecutive measurements. Although the system only measures the current direction of the source, the position of the source can be estimated because the robot is able to move and can therefore obtain measurements from different directions. These actions by the robot successively reduce uncertainty about the source's position. An information gain mechanism is used for selecting the most informative actions in order to minimize the number of actions required to achieve accurate and precise position estimates in azimuth and distance. We show that this mechanism is an efficient solution to the action selection problem for source localization, and that it is able to produce precise position estimates despite simplified unisensory preprocessing. Because of the robot's mobility, this approach is suitable for use in complex and cluttered environments. We present qualitative and quantitative results of the system's performance and discuss possible areas of application.

No MeSH data available.


Interaural Temporal Differences.Depending on the position of the source, the emitted sound needs a different amount of time to travel to the individual ears. By estimating this interaural delay, the listener is able to identify the position of the source.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4556528&req=5

pone.0137057.g003: Interaural Temporal Differences.Depending on the position of the source, the emitted sound needs a different amount of time to travel to the individual ears. By estimating this interaural delay, the listener is able to identify the position of the source.

Mentions: We use a filterbank with 128 channels with center frequencies between 75 and 1800 Hz. After generating cochleagrams for the audio input of the left and right ears, these are used to estimate the position of the source. For auditory source localization we use a classic binaural approach based on ITDs [24](While a filterbank with 128 channels seems excessive for the calculation of ITDs, we choose to include a high number of channels because we are interested in investigating whether the transfer function induced by the artificial pinnae influences the performance of ITD-based source localization.): The basic idea of this approach is that sound takes time to travel from one ear to the other and that this temporal delay can be utilized to determine the azimuthal position of the source. This delay increases monotonically with the azimuth angle, though not linearly (at least in our case; it depends on the shape of the robot’s head). The basic principle is illustrated in Fig 3. In practice, we measure the difference of the time of arrival between the left and right channel. In our system, this is achieved by calculating the normalized cross-correlation between the cochleagram representations of the recorded audio signals of the left and the right earR[t]=∑k′,t′(cl[k′,t′]cr[k+k′,t+t′])∑k′,t′cl[k′,t′]2∑k′,t′cr[k+k′,t+t′]2.(2)Here, cl and cr denote the left and right cochleagram, respectively, k denotes the channel within the filterbank and t denotes time. The correlation results can be used to estimate the temporal delay between the left and right ear by identifying the index t for which cross-correlation is maximal. The measured delays with maximum correlation can then be mapped to their corresponding angles because this mapping is monotonous and relatively unambiguous. In order to estimate a reasonably precise mapping, we measure ITDs for the robot’s head with approximately 6 degree spacing using a speaker array arranged in a semi-circle in a semi-anechoic chamber for 240 training samples. For each of the 31 positions we then calculate the mean ITD (averaged over all training samples for the respective position) and the corresponding standard deviation to define our auditory sensor model, which is used to find the angles corresponding to a particular delay and is described in further detail below. The decision to calculate ITDs utilizing a cochleagram representation (in contrast to calculate them directly on the time-domain data) is mostly motivated by the observation that time-frequency representations seem to be less prone to noise. Furthermore, as we are using bandpass filters with center frequencies between 75 and 1800 Hz, only a relatively limited set of frequency bands is considered, which reduces sensitivity to high-frequency noise.


Information-Driven Active Audio-Visual Source Localization.

Schult N, Reineking T, Kluss T, Zetzsche C - PLoS ONE (2015)

Interaural Temporal Differences.Depending on the position of the source, the emitted sound needs a different amount of time to travel to the individual ears. By estimating this interaural delay, the listener is able to identify the position of the source.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4556528&req=5

pone.0137057.g003: Interaural Temporal Differences.Depending on the position of the source, the emitted sound needs a different amount of time to travel to the individual ears. By estimating this interaural delay, the listener is able to identify the position of the source.
Mentions: We use a filterbank with 128 channels with center frequencies between 75 and 1800 Hz. After generating cochleagrams for the audio input of the left and right ears, these are used to estimate the position of the source. For auditory source localization we use a classic binaural approach based on ITDs [24](While a filterbank with 128 channels seems excessive for the calculation of ITDs, we choose to include a high number of channels because we are interested in investigating whether the transfer function induced by the artificial pinnae influences the performance of ITD-based source localization.): The basic idea of this approach is that sound takes time to travel from one ear to the other and that this temporal delay can be utilized to determine the azimuthal position of the source. This delay increases monotonically with the azimuth angle, though not linearly (at least in our case; it depends on the shape of the robot’s head). The basic principle is illustrated in Fig 3. In practice, we measure the difference of the time of arrival between the left and right channel. In our system, this is achieved by calculating the normalized cross-correlation between the cochleagram representations of the recorded audio signals of the left and the right earR[t]=∑k′,t′(cl[k′,t′]cr[k+k′,t+t′])∑k′,t′cl[k′,t′]2∑k′,t′cr[k+k′,t+t′]2.(2)Here, cl and cr denote the left and right cochleagram, respectively, k denotes the channel within the filterbank and t denotes time. The correlation results can be used to estimate the temporal delay between the left and right ear by identifying the index t for which cross-correlation is maximal. The measured delays with maximum correlation can then be mapped to their corresponding angles because this mapping is monotonous and relatively unambiguous. In order to estimate a reasonably precise mapping, we measure ITDs for the robot’s head with approximately 6 degree spacing using a speaker array arranged in a semi-circle in a semi-anechoic chamber for 240 training samples. For each of the 31 positions we then calculate the mean ITD (averaged over all training samples for the respective position) and the corresponding standard deviation to define our auditory sensor model, which is used to find the angles corresponding to a particular delay and is described in further detail below. The decision to calculate ITDs utilizing a cochleagram representation (in contrast to calculate them directly on the time-domain data) is mostly motivated by the observation that time-frequency representations seem to be less prone to noise. Furthermore, as we are using bandpass filters with center frequencies between 75 and 1800 Hz, only a relatively limited set of frequency bands is considered, which reduces sensitivity to high-frequency noise.

Bottom Line: These actions by the robot successively reduce uncertainty about the source's position.Because of the robot's mobility, this approach is suitable for use in complex and cluttered environments.We present qualitative and quantitative results of the system's performance and discuss possible areas of application.

View Article: PubMed Central - PubMed

Affiliation: Cognitive Neuroinformatics, Bremen University, Bremen, Germany.

ABSTRACT
We present a system for sensorimotor audio-visual source localization on a mobile robot. We utilize a particle filter for the combination of audio-visual information and for the temporal integration of consecutive measurements. Although the system only measures the current direction of the source, the position of the source can be estimated because the robot is able to move and can therefore obtain measurements from different directions. These actions by the robot successively reduce uncertainty about the source's position. An information gain mechanism is used for selecting the most informative actions in order to minimize the number of actions required to achieve accurate and precise position estimates in azimuth and distance. We show that this mechanism is an efficient solution to the action selection problem for source localization, and that it is able to produce precise position estimates despite simplified unisensory preprocessing. Because of the robot's mobility, this approach is suitable for use in complex and cluttered environments. We present qualitative and quantitative results of the system's performance and discuss possible areas of application.

No MeSH data available.