Computer Science Faculty Publications

Automatic Video Self Modeling for Voice Disorder

Ju Shen, University of DaytonFollow
Changpeng Ti, University of Kentucky
Anusha Raghunathan, Intel Corp.
Sen-ching S. Cheung, University of Kentucky
Rita Patel, Indiana University - Bloomington

Document Type

Article

Publication Date

7-2015

Publication Source

Multimedia Tools and Applications

Abstract

Video self modeling (VSM) is a behavioral intervention technique in which a learner models a target behavior by watching a video of him- or herself. In the field of speech language pathology, the approach of VSM has been successfully used for treatment of language in children with Autism and in individuals with fluency disorder of stuttering. Technical challenges remain in creating VSM contents that depict previously unseen behaviors. In this paper, we propose a novel system that synthesizes new video sequences for VSM treatment of patients with voice disorders. Starting with a video recording of a voice-disorder patient, the proposed system replaces the coarse speech with a clean, healthier speech that bears resemblance to the patient’s original voice. The replacement speech is synthesized using either a text-to-speech engine or selecting from a database of clean speeches based on a voice similarity metric. To realign the replacement speech with the original video, a novel audiovisual algorithm that combines audio segmentation with lip-state detection is proposed to identify corresponding time markers in the audio and video tracks. Lip synchronization is then accomplished by using an adaptive video re-sampling scheme that minimizes the amount of motion jitter and preserves the spatial sharpness. Results of both objective measurements and subjective evaluations on a dataset with 31 subjects demonstrate the effectiveness of the proposed techniques.

Inclusive pages

5329-5351

ISBN/ISSN

1380-7501

Document Version

Postprint

Comments

Document available for download is the authors' accepted manuscript, provided in compliance with publisher policy on self-archiving. Permission documentation is on file.

Copyright

Publisher

Springer

Volume

Peer Reviewed

yes

Issue

eCommons Citation

Shen, Ju; Ti, Changpeng; Raghunathan, Anusha; Cheung, Sen-ching S.; and Patel, Rita, "Automatic Video Self Modeling for Voice Disorder" (2015). Computer Science Faculty Publications. 45.
https://ecommons.udayton.edu/cps_fac_pub/45

Download

Link to published version

Included in

Databases and Information Systems Commons, Graphics and Human Computer Interfaces Commons, Information Security Commons, Numerical Analysis and Scientific Computing Commons, OS and Networks Commons, Other Computer Sciences Commons, Programming Languages and Compilers Commons, Software Engineering Commons, Systems Architecture Commons, Theory and Algorithms Commons

COinS

Computer Science Faculty Publications

Automatic Video Self Modeling for Voice Disorder

Document Type

Publication Date

Publication Source

Abstract

Inclusive pages

ISBN/ISSN

Document Version

Comments

Copyright

Publisher

Volume

Peer Reviewed

Issue

eCommons Citation

Included in

ENTER SEARCH TERMS

Contribute Work

SelectedWorks

Browse

Contribute Work

Browse

Links

Computer Science Faculty Publications

Automatic Video Self Modeling for Voice Disorder

Author(s)

Document Type

Publication Date

Publication Source

Abstract

Inclusive pages

ISBN/ISSN

Document Version

Comments

Copyright

Publisher

Volume

Peer Reviewed

Issue

eCommons Citation

Included in

Share

ENTER SEARCH TERMS

Contribute Work

SelectedWorks

Browse

Contribute Work

Browse

Links