C.L.E.E.S.E. (Combinatorial Expressive Speech Engine) is a tool designed to generate an infinite number of natural-sounding, expressive variations around an original speech recording. More precisely, C.L.E.E.S.E. creates random fluctuations around the file’s original contour of pitch, loudness, timbre and speed (i.e. roughly defined, its prosody). One of its applications is the generation of very many random voice stimuli for reverse correlation experiments, or whatever else you fancy, really.
C.L.E.E.S.E. was implemented by Juan José Burred and Emmanuel Ponsot (CREAM Lab, IRCAM, Paris), with generous funding from the European Research Council (CREAM #335536, 2014-2019, PI: JJ Aucouturier)
Random pitch variations around the same recording (French sentence: “Je suis en route pour la réunion” – I’m on my way to the meeting).
Same recording, with random speed variations around the original speed contour
Same recording, with random timbre variations (i.e. frequency warping of the spectral envelope)
All this is obviously language-independent. `We’ll stop in a couple of minutes’, in Japanese, with random pitch:
CLEESE is implemented as a free, open-source Python toolbox. As of March 2018, it is is available as a free download on the IRCAM Forum, the community for science and art users of audio software developed in the IRCAM community. Simply follow the download link at http://forumnet.ircam.fr/product/cleese/, create a (free) IRCAM Forum account (or login with your account if you already have one), and download the .zip file on your computer. The file contains the python package, a pdf documentation, as well as a jupyter notebook tutorial.
Ponsot, E., Arias, P. & Aucouturier, JJ. (2018). Uncovering mental representations of smiled speech using reverse correlation. J. Acoust. Soc. Am. 143 (1). [html] [pdf]
Ponsot, E., Burred, JJ., Belin, P. & Aucouturier, JJ. (2018) Cracking the social code of speech prosody using reverse correlation. Proceedings of the National Academy of Sciences [html] [pdf]