Two-stage Training Method for Japanese Electrolaryngeal Speech Enhancement Based on Sequence-to-sequence Voice Conversion

Authors: Ding Ma, Lester Phillip Violeta, Kazuhiro Kobayashi, Tomoki Toda
Comments: Accept by IEEE SLT.

Dataset

Two Datasets are recorded: RealEL, which is the EL corpus recorded by a laryngectomee; SimuEL and NormSP, which are simulated EL corpus and normal corpus recorded by a healthy speaker.

Two experimental cases to evaluate our proposed method.

Speech Samples of Case 1

Transcription: 二百円の牛乳を何本か買ったから、六百円か、八百円だったわ。 (nihyakuen no gyuunyuu wo nanbon ka katta kara、roppyakuen ka、happyakuen dattawa。)


EL speechBaseline systemStage I of proposed methodStage II of proposed methodNormal speech

Transcription: ンジャメナ、んちゃって、言いにくいな。 (njyamena、ncyatte、iinikuina。)


EL speechBaseline systemStage I of proposed methodStage II of proposed methodNormal speech

Speech Samples of Case 2

Transcription: お待たせしました。 (o ma tase shi mashi ta。)


EL speechBaseline systemStage I of proposed methodStage II of proposed methodNormal speech

Transcription: 空いているあちらの席に移りたいのですが。 (a i te iru achira no seki ni utsu ri tai no desu ga。)


EL speechBaseline systemStage I of proposed methodStage II of proposed methodNormal speech