Limits...
An automatic method to detect and track the glottal gap from high speed videoendoscopic images.

Andrade-Miranda G, Godino-Llorente JI, Moro-Velázquez L, Gómez-García JA - Biomed Eng Online (2015)

Bottom Line: Thanks to the ROI implementation, our technique is robust to the camera shifting and also the objective test proved the effectiveness and performance of the approach in the most challenging scenarios that it is when exist an inappropriate closure of the vocal folds.The novelties of the proposed algorithm relies on the used of temporal information for identify an adaptive ROI and the use of watershed merging combined with active contours for the glottis delimitation.Additionally, an automatic procedure for synthesize multiline VKG by the identification of the glottal main axis is developed.

View Article: PubMed Central - PubMed

Affiliation: Centro de Tecnología Biomédica, Universidad Politécnica de Madrid, Campus de Montegancedo, Crta. M40 km, 38, Madrid, Spain. gxandrade@ics.upm.es.

ABSTRACT

Background: The image-based analysis of the vocal folds vibration plays an important role in the diagnosis of voice disorders. The analysis is based not only on the direct observation of the video sequences, but also in an objective characterization of the phonation process by means of features extracted from the recorded images. However, such analysis is based on a previous accurate identification of the glottal gap, which is the most challenging step for a further automatic assessment of the vocal folds vibration.

Methods: In this work, a complete framework to automatically segment and track the glottal area (or glottal gap) is proposed. The algorithm identifies a region of interest that is adapted along time, and combine active contours and watershed transform for the final delineation of the glottis and also an automatic procedure for synthesize different videokymograms is proposed.

Results: Thanks to the ROI implementation, our technique is robust to the camera shifting and also the objective test proved the effectiveness and performance of the approach in the most challenging scenarios that it is when exist an inappropriate closure of the vocal folds.

Conclusions: The novelties of the proposed algorithm relies on the used of temporal information for identify an adaptive ROI and the use of watershed merging combined with active contours for the glottis delimitation. Additionally, an automatic procedure for synthesize multiline VKG by the identification of the glottal main axis is developed.

No MeSH data available.


Related in: MedlinePlus

Subjective evaluation of the pre-processing algorithms. A visual representation of the different enhancement methods for four different HSDI. a CLAHE; b non-linear transformation with ; c FFT enhancement
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4625946&req=5

Fig10: Subjective evaluation of the pre-processing algorithms. A visual representation of the different enhancement methods for four different HSDI. a CLAHE; b non-linear transformation with ; c FFT enhancement

Mentions: Before validating the reliability of the algorithm some parameters have to be adjusted and some justifications need to be done. Firstly, it is necessary to justify the selection of the enhancement method considering subjective and objective criteria. The quality of the image enhancement techniques is difficult to assess, since evaluating enhancement techniques is still an open problem. The goal of the enhancement is to improve the contrast and illumination of the image, allowing a machine-based vision analysis. Some of the objective measures used for evaluating the enhancement method are Mean Square Error (MSE) and Peak Signal-to-Noise-Ratio (PSNR). However, they are not suitable for many applications and fail to accurately reflect the subtleties of human perception. In [42] an interesting framework is proposed combining three measures including PSNR, Edge Overlapping Ratio (EOR) and Mean Segment Overlapping Ratio (MSOR), corresponding to three image features including intensity, edge and segment. In order to evaluate the performance of the enhancement methods to the problem under study, the objective measure proposed in [42] is employed to 110 HSDI, extracted from the 22 videos of the database. Considering the literature, three enhancement methods are used; FFT, CLAHE and non linear transformation. The non linear transformation is tested with different values of with an incremental step of 30 from 100 up to 300. The obtained results are presented in Fig. 9 and summarized in Table 1 for the most relevant cases. The first graphic describes the intensity changes before and after enhancement (PSNR); the second describes the similarity between edges; and, lastly, MSOR describes the similarity between regions. For laryngeal HSDI, well defined edges and well delimited regions (EOR, MSOR) should be prioritized to facilitate the latter segmentation step. After analyzing the objective results and considering also the subjective evaluation based on visual inspection of the image contrast (Fig. 10) and specially the reduction of the flashing effect, the non-linear transformation with parameter is chosen because it keeps a good balance between objective and subjective evaluations.Fig. 9


An automatic method to detect and track the glottal gap from high speed videoendoscopic images.

Andrade-Miranda G, Godino-Llorente JI, Moro-Velázquez L, Gómez-García JA - Biomed Eng Online (2015)

Subjective evaluation of the pre-processing algorithms. A visual representation of the different enhancement methods for four different HSDI. a CLAHE; b non-linear transformation with ; c FFT enhancement
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4625946&req=5

Fig10: Subjective evaluation of the pre-processing algorithms. A visual representation of the different enhancement methods for four different HSDI. a CLAHE; b non-linear transformation with ; c FFT enhancement
Mentions: Before validating the reliability of the algorithm some parameters have to be adjusted and some justifications need to be done. Firstly, it is necessary to justify the selection of the enhancement method considering subjective and objective criteria. The quality of the image enhancement techniques is difficult to assess, since evaluating enhancement techniques is still an open problem. The goal of the enhancement is to improve the contrast and illumination of the image, allowing a machine-based vision analysis. Some of the objective measures used for evaluating the enhancement method are Mean Square Error (MSE) and Peak Signal-to-Noise-Ratio (PSNR). However, they are not suitable for many applications and fail to accurately reflect the subtleties of human perception. In [42] an interesting framework is proposed combining three measures including PSNR, Edge Overlapping Ratio (EOR) and Mean Segment Overlapping Ratio (MSOR), corresponding to three image features including intensity, edge and segment. In order to evaluate the performance of the enhancement methods to the problem under study, the objective measure proposed in [42] is employed to 110 HSDI, extracted from the 22 videos of the database. Considering the literature, three enhancement methods are used; FFT, CLAHE and non linear transformation. The non linear transformation is tested with different values of with an incremental step of 30 from 100 up to 300. The obtained results are presented in Fig. 9 and summarized in Table 1 for the most relevant cases. The first graphic describes the intensity changes before and after enhancement (PSNR); the second describes the similarity between edges; and, lastly, MSOR describes the similarity between regions. For laryngeal HSDI, well defined edges and well delimited regions (EOR, MSOR) should be prioritized to facilitate the latter segmentation step. After analyzing the objective results and considering also the subjective evaluation based on visual inspection of the image contrast (Fig. 10) and specially the reduction of the flashing effect, the non-linear transformation with parameter is chosen because it keeps a good balance between objective and subjective evaluations.Fig. 9

Bottom Line: Thanks to the ROI implementation, our technique is robust to the camera shifting and also the objective test proved the effectiveness and performance of the approach in the most challenging scenarios that it is when exist an inappropriate closure of the vocal folds.The novelties of the proposed algorithm relies on the used of temporal information for identify an adaptive ROI and the use of watershed merging combined with active contours for the glottis delimitation.Additionally, an automatic procedure for synthesize multiline VKG by the identification of the glottal main axis is developed.

View Article: PubMed Central - PubMed

Affiliation: Centro de Tecnología Biomédica, Universidad Politécnica de Madrid, Campus de Montegancedo, Crta. M40 km, 38, Madrid, Spain. gxandrade@ics.upm.es.

ABSTRACT

Background: The image-based analysis of the vocal folds vibration plays an important role in the diagnosis of voice disorders. The analysis is based not only on the direct observation of the video sequences, but also in an objective characterization of the phonation process by means of features extracted from the recorded images. However, such analysis is based on a previous accurate identification of the glottal gap, which is the most challenging step for a further automatic assessment of the vocal folds vibration.

Methods: In this work, a complete framework to automatically segment and track the glottal area (or glottal gap) is proposed. The algorithm identifies a region of interest that is adapted along time, and combine active contours and watershed transform for the final delineation of the glottis and also an automatic procedure for synthesize different videokymograms is proposed.

Results: Thanks to the ROI implementation, our technique is robust to the camera shifting and also the objective test proved the effectiveness and performance of the approach in the most challenging scenarios that it is when exist an inappropriate closure of the vocal folds.

Conclusions: The novelties of the proposed algorithm relies on the used of temporal information for identify an adaptive ROI and the use of watershed merging combined with active contours for the glottis delimitation. Additionally, an automatic procedure for synthesize multiline VKG by the identification of the glottal main axis is developed.

No MeSH data available.


Related in: MedlinePlus