A Curriculum Learning Paradigm for Speech Emotion Transfer: Overcoming Real-Data Training Challenges

Anonymous authors*,

SSST

Source Text Source Audio Reference Audio Output Audio
"Hypocritical is pot calling the kettle genocide" Angry cheerful Cheerful
"Hypocritical is pot calling the kettle genocide" Sad Angry Angry

SSDT

Source Text Source Audio Reference Audio Output Audio
"Tom beats that farmer" Angry Anxious Anxious
"John laughs like your father." Anxious Sad Sad

DSST

Source Text Source Audio Reference Audio Output Audio
"Tom beats that farmer" Angry Apologetic Apologetic
"Tom beats that farmer" Happy Angry Angry

DSDT

Source Text Source Audio Reference Audio Output Audio
"I did go and made many prisoners" Happy Sad Sad
"A nauseous draught" Neutral Surprise Surprise