The ACII Affective Vocal Bursts (A-VB) Workshop & Competition
Understanding a critically understudied modality of emotional expression
Alice Baird, Panagiotis Tzirakis, Jeffrey Brooks,
Björn Schuller, Anton Batliner, Dacher Keltner, Alan Cowen
The 2022 ACII Affective Vocal Burst Workshop & Challenge (A-VB) is a workshop-based challenge that introduces the problem of understanding emotion in vocal bursts – a wide range of non-verbal vocalizations that includes laughs, grunts, gasps, and much more. With affective states informing both mental and physical wellbeing, the core focus of the A-VB workshop is the broader discussion of current strategies in affective computing for modeling vocal emotional expression. Within this first iteration of the A-VB Challenge, the participants will be presented with four emotion-focused sub-challenges that utilize the large-scale and “in-the-wild” Hume-VB dataset. The dataset and the four sub-challenges draw attention to new innovations in emotion science as it pertains to vocal expression, addressing low- and high-dimensional theories of emotional expression, cultural variation, and “call types” (laugh, cry, sigh, etc.).
Baselines and Results
The A-VB white paper details the baseline results. Participant results for each task are as follows. * A member of the team is an A-VB organiser, therefore this result is excluded from the official rankings.
Team | Task | Test CCC |
Organisers ComParE BL | A-VB High | 0.5214 |
Organisers End2You BL | A-VB High | 0.5686 |
TeamEP-ITS | A-VB High | 0.6554 |
SclabCNU | A-VB High | 0.6677 |
HCAI | A-VB High | 0.6846 |
HCCL | A-VB High | 0.7237 |
Anonymous (Winners!) | A-VB High | 0.7295 |
EIHW* | A-VB High | 0.7363 |
Team | Task | Test CCC |
Organisers ComParE BL | A-VB Two | 0.4986 |
Organisers End2You BL | A-VB Two | 0.5084 |
SclabCNU | A-VB Two | 0.6202 |
TeamEP-ITS | A-VB Two | 0.629 |
HCCL (Winners!) | A-VB Two | 0.6854 |
EIHW* | A-VB Two | 0.7066 |
Team | Task | Test CCC |
ComParE BL | A-VB Culture | 0.3887 |
End2You BL | A-VB Culture | 0.4401 |
TeamEP-ITS | A-VB Culture | 0.5199 |
HCAI | A-VB Culture | 0.5258 |
SclabCNU | A-VB Culture | 0.5495 |
HCCL (Winners!) | A-VB Culture | 0.6017 |
EIHW* | A-VB Culture | 0.6195 |
Team | Task | Test UAR |
ComParE BL | A-VB Type | 0.3839 |
End2You BL | A-VB Type | 0.4172 |
TeamEP-ITS | A-VB Type | 0.4902 |
SclabCNU | A-VB Type | 0.497 |
AVB | A-VB Type | 0.519 |
EIHW* | A-VB Type | 0.5618 |
HCAI (Winners!) | A-VB Type | 0.5856 |
Workshop Schedule
Monday 17th October, ACII (Virtual).
NYC GMT-4 | TOKYO GMT+9 | Type | Talk Title | Who |
---|---|---|---|---|
06:00-06:10 | 19:00-19:10 | Organisers | Workshop Welcome | Alice Baird |
06:10-06:25 | 19:10-19:25 | Paper | The ACII 2022 Affective Vocal Bursts Workshop & Competition: Understanding a critically understudied modality of emotional expression | Alice Baird |
06:30-06:45 | 19:30-19:45 | Paper | Jointly Predicting Emotion, Age, and Country Using Pre-Trained Acoustic Embedding | Bagus Tris Atmaja |
06:50-07:05 | 19:50-20:05 | Paper | Predicting Affective Vocal Bursts with Finetuned wav2vec 2.0 | Bagus Tris Atmaja |
07:05-07:50 | 20:05-20:50 | Keynote | Nonverbals and what you might have wanted to know about them | Anton Batliner |
08:00-08:15 | 21:00-21:15 | Coffee Break | ||
08:15-08:30 | 21:15-21:30 | Paper | Classification of Vocal Bursts for ACII 2022 A-VB-Type Competition using Convolutional Network Networks and Deep Acoustic Embeddings | Zafi Sherhan Syed |
08:35-08:50 | 21:35-21:50 | Paper | Self-Relation Attention and Temporal Awareness for Emotion Recognition via Vocal Burst | Dang-Linh Trinh |
08:55-09:10 | 21:55-22:10 | Paper | Fine-tuning Wav2vec for Vocal-burst Emotion Recognition | Hyung-Jeong Yang |
09:15-10:00 | 22:15-23:00 | Keynote | New resources for measuring expressive communication | Alan Cowen |
10:15-10:30 | 23:15-23:10 | Coffee Break | ||
10:30-10:45 | 23:30-23:45 | Paper | An Efficient Multitask Learning Architecture for Affective Vocal Burst Analysis | Tobias Hallmen |
10:50-11:05 | 23:50-00:05 | Paper | Self-Supervised Attention Networks and Uncertainty Loss Weighting for Multi-Task Emotion Recognition on Vocal Bursts | Vincent Keras |
11:10-11:30 | 00:10-00:30 | Organisers | Winner Announcements and Closing Remarks | Alice Baird |
Keynote Speakers
Dr Anton Batliner. University of Augsburg, Germany. “Nonverbals and what you might have wanted to know about them”.
Dr. Anton Batliner received his doctoral degree in Phonetics in 1978 from LMU Munich. He is now with the Chair of Embedded Intelligence for Health Care and Wellbeing at the University of Augsburg, Germany. His main research interests are all (cross-linguistic) aspects of prosody and (computational) paralinguistics; h-index > 50, > 13000 citations.
Dr. Alan Cowen. Hume AI, New York, USA. “New resources for measuring expressive communication"
Dr. Alan Cowen is an applied mathematician and computational emotion scientist developing new data-driven methods to study human experience and expression. He was previously a researcher at the University of California and visiting scientist at Google, where he helped establish affective computing research efforts. His discoveries have been featured in leading journals such as Nature, PNAS, Science Advances, and Nature Human Behavior and covered in press outlets ranging from CNN to Scientific American. His research applies new computational tools to address how emotional behaviors can be evoked, conceptualized, predicted, and annotated, how they influence our social interactions, and how they bring meaning to our everyday lives.
Important Dates
Challenge Opening (data available): May 27, 2022
Baselines information released: July 1, 2022
‘Other Topics’ deadline [CMT]: July 22, 2022 August 1, 2022
(Included in ACII Proceedings)Notification of Acceptance: July 29, 2022 August 8, 2022
Camera Ready: August 15, 2022
Competition deadline: September 2, 2022 September 14. 2022
Competition Technical Report submission deadline [CMT]: September 6, 2022 September 16, 2022
(Peer Reviewed by A-VB technical committee, not included in ACII Proceedings)Notification of Acceptance: September 16, 2022 September 23, 2022
Workshop: October 17, 2022
Paper Submission
As well as the test set results, all participants in the A-VB competition should also submit a technical report to describe their approach and results. We suggest that this paper is no more than 4 pages, and can be uploaded to arXiv, as well as the CMT.
The baseline white paper provides a more extensive description of the data as well as baseline results. Competition papers should include the following citation for the data repository:
@article{Cowen2022HumeVB, title={The Hume Vocal Burst Competition Dataset {(H-VB)} | Raw Data [ExVo: updated 02.28.22] [Data set]}, author={Cowen, Alan and Baird, Alice and Tzirakis, Panagiotis and Opara, Michael and Kim, Lauren and Brooks, Jeff and Metrick, Jacob}, journal={Zenodo}, doi = {https://doi.org/10.5281/zenodo.6308780}, year={2022}} @misc{BairdA-VB2022, author = {Baird, Alice and Tzirakis, Panagiotis and Batliner, Anton and Schuller, Björn and Keltner, Dacher and Cowen, Alan}, title = {The ACII 2022 Affective Vocal Bursts Workshop and Competition: Understanding a critically understudied modality of emotional expression}, publisher = {arXiv}, doi = {[to appear]}, year = {2022}}
Other Topics
For those interested in submitting research to the A-VB workshop outside of the competition, we encourage contributions covering the following topics:
Detecting and Understanding Nonverbal Vocalizations
Modeling Vocal Emotional Expression
Cross-Cultural Emotional Expression Modeling
Other topics related to Auditory Affective Computing
These submissions will be included in the ACII IEEE proceedings. Authors are asked to submit papers up to 6 pages (including references) following the submission guidelines from the ACII 2022 conference. Directions for submitting papers will be announced soon (submissions will be handled via the conference submission system (EasyChair) ). All submissions will be reviewed single-blind.
Competition Tasks and Rules
The High-Dimensional Emotion Task (A-VB High).
The A-VB High track, explores a high-dimensional emotion space for understanding vocal bursts. Participants will be challenged with predicting the intensity of 10 emotions (Awe, Excitement, Amusement, Awkwardness, Fear, Horror, Distress, Triumph, Sadness, and Surprise) associated with each vocal burst as a multi-output regression task. Participants will report the average Concordance Correlation Coefficient (CCC), as well as the Pearson correlation coefficient, across all 10 emotions. The baseline for this challenge will be based on CCC.
The Two-Dimensional Emotion Task (A-VB Two).
In the A-VB Two track, we investigate a low-dimensional emotion space that is based on the circumplex model of affect. Participants will predict values of arousal and valence (on a scale from 1=unpleasant/subdued, 5=neutral, 9=pleasant/stimulated) as a regression task. Participants will report the average Concordance Correlation Coefficient (CCC), as well as the Pearson correlation coefficient, across the two dimensions. The baseline for this challenge will be based on CCC.
The Cross-Cultural Emotion Task (A-VB Culture).
In the A-VB Culture track, participants will be challenged with predicting the intensity of 10 emotions associated with each vocal burst as a multi-output regression task, using a model or multiple models that generate predictions specific to each of the four cultures (the U.S., China, Venezuela, or South Africa). Specifically, annotations of each vocal burst will consist of culture-specific ground truth, meaning that the ground truth for each sample will be the average of annotations solely from the country of origin of the sample. Participants will report the average Concordance Correlation Coefficient (CCC), as well as the Pearson correlation coefficient, across all 10 emotions. The baseline for this challenge will be based on CCC.
The Expressive Burst-Type Task (A-VB Type).
In the A-VB Type task, participants will be challenged with classifying the type of expressive vocal burst from 7 classes (Gasp, Laugh, Cry, Scream, Grunt, Groan, Pant, Other). Participants will report the Unweighted Average Recall (UAR) as a measure of performance.
Data and Team Registration
This package includes the raw data for a subset of The Hume Vocal Burst Database (H-VB), including all train, validation, and test recordings and corresponding emotion ratings for the train and validation recordings.
This dataset contains 59,201 audio recordings of vocal bursts from 1,702 speakers, from 4 cultures—the U.S, South Africa, China, and Venezuela—ranging in age from 20 to 39.5 years old. The duration of data in this version of H-VB is 36 Hours (Mean: 02.23 sec). The emotion ratings correspond to ten emotion concepts, listed below, and averaged 0-100 intensities for each emotion concept, with each sample having been rated by an average of 85.2 raters.
Emotion Labels: Awe, Excitement, Amusement, Awkwardness, Fear, Horror, Distress, Triumph, Sadness, Surprise
Train | Validation | Test | |
---|---|---|---|
HH:MM:SS | 12:19:06 | 12:05:45 | 12:22:12 |
Samples | 19,990 | 19,396 | 19,815 |
Speakers | 571 | 568 | 563 |
F:M | 305:266 | 324:244 | -- |
USA | 206 | 206 | -- |
China | 79 | 76 | -- |
South Africa | 244 | 244 | -- |
Venezuela | 42 | 42 | -- |
An overview of the data can be found at this Zenodo repository. To gain access, register your team by emailing competitions@hume.ai with the following information:
Team Name, Researcher Name, Affiliation, and Research Goals
Restricted Access: After registering your team, you will receive an End User License Agreement (EULA) for signature. Please note that this dataset is provided only for competition use. Requests for use of the data beyond the competition should be directed to Hume AI (hello@hume.ai).
Results Submission
For all tasks, participants should submit their test set results as a zip file to competitions@hume.ai, following these guidelines:
Predictions should be submitted as a comma-delineated CSV with the following naming convention: [taskname]_[team name]_[submission no].csv
The CSV should contain only one prediction per test set file.
Key Note Speakers
During the workshop, we will host talks from renowned experts in the field. Full list to be announced shortly.
Organizers
Alice Baird. Hume AI, New York, USA. Alice Baird is an audio researcher with interdisciplinary expertise in machine learning, computational paralinguistics, stress, and emotional well-being. She completed her PhD at the University of Augsburg’s Chair of Embedded Intelligence for Health Care and Wellbeing in 2021, where she was supervised by Dr Björn Schuller. Her work on emotion understanding from speech, physiological, and multimodal data has been published extensively in leading journals and conferences including INTERSPEECH, ICASSP, IEEE Intelligent Systems, and the IEEE Journal of Biomedical and Health Informatics (i10-index: 29). Alice has had extensive experience with competition organization, holding data chair for both the INTERSPEECH Computational Paralinguistics Challenge (ComParE) and ACM MM Multimodal Sentiment Challenge (MuSe). She recently joined Hume AI as an AI research scientist.
Panagiotis Tzirakis. Hume AI, New York, USA. Dr Tzirakis is a computer scientist and AI researcher with expertise in deep learning and emotion recognition across modalities. He earned his Ph.D. with the Intelligent Behaviour Understanding Group (iBUG) at Imperial College London, where he advanced multimodal emotion recognition efforts. He has published in top outlets including Information Fusion, International Journal of Computer Vision, and several IEEE conference proceedings (e.g. ICASSP, INTERSPEECH) on topics including 3D facial motion synthesis, multi-channel speech enhancement, the detection of Gibbon calls, and emotion recognition from audio and video (i10-index: 16). He recently joined Hume AI as an AI research scientist.
Jeffrey Brooks. Hume AI, New York, USA. Dr. Brooks is a computational psychologist with expertise in emotion, face perception, and social neuroscience. He earned his Ph.D. from the Social Cognitive and Neural Sciences Lab at NYU, where he researched the computational and neural mechanisms of emotion perception and social evaluation. His research has been published in the Proceedings of the National Academy of Sciences, Nature Human Behavior, and other leading journals.
Anton Batliner. University of Augsburg, Germany. Anton Batliner received his doctoral degree in Phonetics in 1978 at LMU Munich. He is now with the Chair of Embedded Intelligence for Health Care and Wellbeing at the University of Augsburg, Germany. His main research interests are all (cross-linguistic) aspects of prosody and (computational) paralinguistics. Amongst other events, he co-organized the previous INTERSPEECH challenges on Computational Paralinguistics. He is co-editor/author of two books and author/co-author of more than 300 technical articles, with an i10-index of 172 and over 12,000 citations.
Björn Schuller. Imperial College London, United Kingdom. Björn W. Schuller received his diploma, doctoral degree, habilitation, and Adjunct Teaching Professor in Machine Intelligence and Signal Processing all in EE/IT from TUM in Munich/Germany. He is Full Professor of Artificial Intelligence and the Head of GLAM – the Group on Language, Audio, & Music at Imperial College London/UK, Full Professor and Chair of Embedded Intelligence for Health Care and Wellbeing at the University of Augsburg/Germany, co-founding CEO and current CSO of audEERING – an Audio Intelligence company based near Munich and in Berlin, Germany amongst other Professorships and Affiliations. He (co-)authored 1000+ publications (43k+ citations, i10-index: 563), is Field Chief Editor of Frontiers in Digital Health and was Editor in Chief of the IEEE Transactions on Affective Computing amongst manifold further commitments and service to the community including Technical Chair of INTERSPEECH 2019 and organization of more than 25 research challenges.
Dacher Keltner. The University of California, Berkeley, California, U.S.A. Dr. Keltner is one of the world’s foremost emotion scientists. He is a professor of psychology at UC Berkeley and the director of the Greater Good Science Center. He has over 200 scientific publications (i10-index: 222) and six books, including Born to Be Good, The Compassionate Instinct, and The Power Paradox. He has written for many popular outlets, from The New York Times to Slate. He was also the scientific advisor behind Pixar’s Inside Out, is involved with the education of health care providers and judges, and has consulted extensively for Google, Facebook, Apple, and Pinterest, on issues related to emotion and well-being.
Alan Cowen. Hume AI, New York, U.S.A. Dr. Cowen is an applied mathematician and computational emotion scientist developing new data-driven methods to study human experience and expression. He was previously a researcher at the University of California and visiting scientist at Google, where he helped establish affective computing research efforts. His discoveries have been featured in top journals such as Nature, PNAS, Science Advances, and Nature Human Behavior (i10-index: 16) and covered in press outlets ranging from CNN to Scientific American. His research applies new computational tools to address how emotional behaviors can be evoked, conceptualized, predicted, and annotated, how they influence our social interactions, and how they bring meaning to our everyday lives.