Two-stage Training Method for Japanese Electrolaryngeal Speech Enhancement Based on Sequence-to-sequence Voice Conversion

Authors: Ding Ma, Lester Phillip Violeta, Kazuhiro Kobayashi, Tomoki Toda

Comments: Accept by IEEE SLT.

Dataset

Two Datasets are recorded: RealEL, which is the EL corpus recorded by a laryngectomee; SimuEL and NormSP, which are simulated EL corpus and normal corpus recorded by a healthy speaker.

Two experimental cases to evaluate our proposed method.

Case 1: SimuEL to NormSP.
Case 2: RealEL to NormSP.

Speech Samples of Case 1

Transcription: 二百円の牛乳を何本か買ったから、六百円か、八百円だったわ。 (nihyakuen no gyuunyuu wo nanbon ka katta kara、roppyakuen ka、happyakuen dattawa。)

EL speech	Baseline system	Stage I of proposed method	Stage II of proposed method	Normal speech

Transcription: ンジャメナ、んちゃって、言いにくいな。 (njyamena、ncyatte、iinikuina。)

EL speech	Baseline system	Stage I of proposed method	Stage II of proposed method	Normal speech

Speech Samples of Case 2

Transcription: お待たせしました。 (o ma tase shi mashi ta。)

EL speech	Baseline system	Stage I of proposed method	Stage II of proposed method	Normal speech

Transcription: 空いているあちらの席に移りたいのですが。 (a i te iru achira no seki ni utsu ri tai no desu ga。)

EL speech	Baseline system	Stage I of proposed method	Stage II of proposed method	Normal speech