Real-time emotional voices with D.A.V.I.D.

“Like an auto-tune, but for emotions” (Brian Resnick, for Vox.com)

David1

DAVID (Da Amazing Voice Inflection Device) is a free, real-time voice transformation tool able to “colour” any voice recording with an emotion that wasn’t intended by it’s speaker. DAVID was especially designed with the affective psychology and neuroscience community in mind, and aims to provide researchers with new ways to produce and control affective stimuli, both for offline listening and for real-time paradigms. For instance, we have used it to create real-time emotional vocal feedback here.

Technically, DAVID is implemented as an open-source patch for the close-source audio processing plateform Max (Cycling’74), and, like Max, is available for most Windows and MacOS configurations.

UPDATE : As of March 2017, DAVID is available as a free download on the IRCAM Forum, which provides updates on the latest software releases, as well as better user support. 

It was developped by Dr Marco Liuni and members of the CREAM Neuroscience Lab at Computer music center IRCAM in Paris (France), with generous funding from the European Research Council (CREAM #335536, 2014-2019, PI: JJ Aucouturier) and in collaboration with Petter Johansson and Lars Hall (Lund University, Sweden), Rodrigo Segnini (Siemens, Japan), Katsumi Watanabe (Waseda University, Japan), and Daniel Richardson (University College, London UK). DAVID was so named after Talking Heads’ frontman David Byrne, whom we were priviledged to count as our early users in March’15.

What does it do?

DAVID is a software tool able to “add” emotion to a speech recording, i.e. it can make that American chap

or that French young lady

sound scared,

happy,

or sad

(and whatever else you’d like them to sound like).

It does so in a way that’s properly validated for behavioural science, i.e. emotions are recognizable by listeners, they are fully controllable and reproducible in intensity, and there are mistaken as natural expressions and not detected as synthetic (Rachman et al, 2015). In fact, even the speakers themselves mistake the manipulated speech as their own (Aucouturier et al., 2015).

In addition, it does so in real-time, i.e. you can transform speech as it is spoken, e.g. over the phone. With modern audio interfaces, we reached in/out latencies as small as 15ms, which even makes it useable as vocal feedback to a speaker without disrupting speech production (Aucouturier et al., 2015).

Installation

DAVID is implemented as an open-source patch for the close-source audio processing plateform Max (Cycling’74). To use DAVID, you need to install Max first.

1. First step: Installing Max

DAVID has been fully tested with Max7, and also supports Max6. Both are available in MacOS and Windows versions on the seller’s website: https://cycling74.com/downloads. Max is available under a commercial license from Cycling’74, but is available in free runtime versions that suffice to run DAVID.

According to its seller Cycling’74, system requirements for Max7 are Intel® Mac with Mac OS X 10.7 (or later), OR a PC with Windows 7 (or later); Multicore processor; 2 GB RAM; 1024×768 display. If your system widely departs from these specifications, consider installing Max6 (https://cycling74.com/downloads/older/).

2. First step: downloading  DAVID

DAVID V1.2 was released on January 31st, 2017. As of March 2017, DAVID is available as a free download on the IRCAM Forum, the community for science and art users of audio software developed in the IRCAM community. Simply follow the download link at http://forumnet.ircam.fr/product/david/, create a (free) IRCAM Forum account (or login with your account if you already have one), and extract the .zip file on your computer. Finally, to open the patch with Max, double click on the DAVID.maxproj file.

png-button

Usage documentation

For documentation, tutorials and user support, please refer to the pages at http://forumnet.ircam.fr/product/david/

Experimental validation

DAVID was extensively validated for use in psychological and neuroscientific experiments. In listening experiments conducted in IRCAM (France), UCL (UK), Lund University (Sweden) and Waseda University (Japan), we found that emotional transformations made with DAVID were well recognized and sounded as natural as non-modified expressions of the same speaker; that the emotional intensity of the transformation could be controlled; and the transformations appeared valid in several languages, namely in French, English, Swedish and Japanese.

The following video is a report on some of the experimental validation presented at the 4th International Conference on Music and Emotion, October 2015 in Geneva (Switzerland)

A complete report about the experimental validation is under review, and available as a preprint from http://biorxiv.org/content/early/2016/01/28/038133

Associated Publications

Rachman, L., Liuni, M., Arias, P., Lind, A., Johansson, P., Hall, L., Richardson, D., Watanabe, K., Aucouturier, J.J. (2017) DAVID: An Open-source platform for real-time emotional speech transformation. Behavior Research Methods (in press). Preprint.

Aucouturier, J.J., Johansson, P., Hall, L., Segnini, R., Mercadié, L. & Watanabe, K. (2015) Covert Digital Manipulation of Vocal Emotion Alter Speakers’ Emotional State in a Congruent Direction. Proceedings of the National Academy of Science. Open-access

Technical support and questions

DAVID was developped by Dr Marco Liuni (Google Scholar), who may be able to help with technical issues (marco.liuni@ircam.fr). Its experimental validation (recognizeability, naturalness, emotional intensity) was conducted by Ms. Laura Rachman (Google Scholar), whom you can ask for DAVID’s applicability to your own research needs (laura.rachman@ircam.fr). Project scientific lead was J.J. Aucouturier (Google Scholar), who’ll be happy to chat about anything, really (aucouturier@gmail.com)