Speakers & Courses

From user to embodied AI: Analysis and automatic generation of multimodal socio-affective behaviours for human-machine interaction
Prof. Magalie Ochs
Laboratoire d’Informatique et des Systèmes, Aix-Marseille University

Show details

Abstract: One of the major challenges in the field of human-machine interaction is to design models that enable the automatic generation of multimodal behaviours in interactive embodied AIs (virtual characters or humanoid robots) during conversations with one or more users. In Artificial Intelligence, one area of research focuses in particular on simulating the socio-affective behaviours. Beyond simple facial expressions of emotions, the aim is to model and simulate social attitudes such as engagement, cold anger or appreciation. These behaviours result from the expression of multimodal signals but also depend on the signals expressed by the user during the interaction. Today, several approaches are proposed to build models that can both automatically analyse users’ multimodal signals and generate appropriate behaviours for interactive systems, combining machine learning approaches on data corpora and procedural approaches. The aim of this presentation is to introduce the various research projects we are conducting to design, implement and evaluate models for the automatic generation of socio-affective behaviours in human-machine interaction. The application framework of these research activities is the social skills training through interaction with socially virtual agent, as for instance the public speaking training in virtual reality. Currently, we are particularly interested in how embodied AI could be used to combat societal inequalities, reduce discrimination and improve inclusion. We have different research projects to this end, for instance on computational models to detect and reduce bias in AI but also the development of virtual female role model to reduce stereotypes or the implementation of a virtual theatre to train the witness of discrimination situations to react.

Bio: Magalie Ochs (https://pageperso.lis-lab.fr/magalie.ochs/) is Professor in Computer Science at Aix-Marseille University in the Laboratoire d’Informatique et des Systèmes (LIS) (https://www.lis-lab.fr/). Since her master in Artificial Intelligence at Montreal University, she is carrying out research aiming at integrating social and emotional intelligence in social robots and virtual agents. She has conducted her research in several national and international laboratories (University Paris 8, Orange Lab, University Paris 6 (LIP6), National Institute of Informatics in Tokyo, Telecom Paris). She has explored different computational methods and models to endow social robots and virtual characters with socio-emotional capabilities (perception, reasoning and expression).

Multimodal Conversational Assistance of Complex Manual Tasks
Prof. João Magalhaes
Department of Computer Science, Universidade NOVA de Lisboa

Show details

Abstract: Conversational agents have become an integral part of our daily routines, aiding humans in various tasks. Helping users in real-world manual tasks is a complex and challenging paradigm, where it is necessary to leverage multiple information sources, provide several multimodal stimuli, and be able to correctly ground the conversation in a helpful and robust manner. In this talk I will describe TWIZ, a conversational AI assistant that is helpful, multimodal, knowledgeable, and engaging, and designed to guide users towards the successful completion of complex manual tasks. To achieve this, we focused our efforts on three main research questions: (1) Humanly-Shaped Conversations, by providing information in a knowledgeable way; (2) Multimodal Stimulus, making use of various modalities including voice, images, and videos; and (3) Zero-shot Conversational Flows, to improve the robustness of the interaction to unseen scenarios. TWIZ is an assistant capable of supporting a wide range of unseen tasks — it leverages Generative AI methods to deliver several innovative features such as creative cooking, video navigation through voice, and the robust PlanLLM, a Large Language Model trained for dialoguing about complex manual tasks.

Bio: João Magalhães is a Full Professor at the Computer Science Dep. at Universidade NOVA de Lisboa and national co-Director of the CMU Portugal partnership. He holds a Ph.D. degree (2008) from Imperial College London, UK. His research aims to move vision and language AI closer to the way humans understand it and communicate. He has made scientific contributions to the fields of multimedia search and summarization, multimodal conversational AI, data mining and multimodal information representation. He is currently coordinating the creation of the sovereign LLM AMALIA, and, in the past, has coordinated and participated in several research projects (national, EU-FP7 and H2020) where he pursues robust and generalizable methods in different domains. He is regularly involved in review panels, organization of international conferences and program committees. His work and the work of his group has been awarded, or nominated for, several honours and distinctions, most notably the 1st prize in the Amazon Alexa Taskbot Challenge 2022. He was the General Chair of ECIR 2020 and ACM Multimedia 2022, Honorary Chair for ACM Multimedia Asia 2021 and will be the PC chair of ACM Multimedia 2026.

Understanding and Being Understood
Prof. Silvia Rossi
PRISCA Lab, University of Naples “Federico II”

Show details

Abstract: Successful and meaningful interaction requires perceiving and interpreting human states, intentions, and emotions in ways that are sensitive to individual differences and situational context. By adapting to who people are and where they are, and what their state is, social robots can personalize their responses, fostering trust, comfort, and engagement. Equally essential is the design of robot behaviors that communicate clearly—actions that are legible, predictable, and contextually appropriate, enabling people to easily interpret and anticipate the robot’s intentions. This talk will explore these two complementary dimensions—personalized, context-aware perception and socially understandable action—as foundations for safe, effective, and widely accepted social robots.

Bio: Silvia Rossi is a full professor of Computer Science at the Department of Electrical Engineering and Information Technologies, University of Naples Federico II. She serves as the scientific director of the PRISCA Lab (Projects of Intelligent Robotics and Advanced Cognitive Systems-https://www.prisca.unina.it). Prof. Rossi holds an M.Sc. in Physics from the University of Naples Federico II (2001) and a Ph.D. in Information and Communication Technologies from the University of Trento (2006).She has played a key role in numerous EU and international research projects and is currently the principal investigator and coordinator of several major initiatives, including the HORIZON-MSCA-2023-DN SWEET(Social aWareness for sErvicE roboTs), and HORIZON-TMA-MSCA-DN TRAIL (TRAnsparent, InterpretabLe Robots). Prof. Rossi chaired the RO-MAN conferences in 2020 and 2022 and is an active member of program committees for leading conferences in human-robot interaction and artificial intelligence. Her research focuses on Socially Assistive Robotics, Human-Robot Interaction, Cognitive Architectures, and User Profiling and Recommender Systems. Her work explores computational methods for designing autonomous agents that can adapt their behavior to effectively interact with and support users. Prof. Rossi has authored over 200 publications in international journals, books, and conference proceedings, advancing the fields of robotics and AI.

Spatial AI and emerging reasoning in end-to-end trained robotic navigation
Dr. Christian Wolf
Naver Labs Europe

Show details

Abstract: An important sub goal of AI is the creation of intelligent agents, which require high-level reasoning capabilities, situation awareness, awareness of the dynamics of the environment, and the capacity of robustly taking the right decisions at the right moments. In this talk we will cover the automatic learning of reasoning capabilities through large-scale training of deep neural networks from data, and we target different tasks involving fast, precise and smooth navigation of terrestrial robots. We will present solutions and describe key features: reinforcement learning, identifying accurate dynamical models for usage in simulation, and the inclusion of geometric foundation models. We also present an in-depth analysis of the type of reasoning emerging in end-to-end trained agents. In particular, we study the presence of realistic dynamics which the agents learned for open-loop forecasting, and their interplay with sensing; the way the agents use latent memory to hold elements of the scene structure; and finally, their planning capabilities. Put together, we present experiments which paint a new picture on how using tools from computer vision and sequential decision making have led to new capabilities in robotics and control. We will also showcase the fleet of autonomous robots operated by Naver Labs Korea in Seoul in the world’s first robot-friendly building.

Bio: Christian WOLF is Principal Scientist at Naver Labs Europe, where he leads the Spatial AI team. He is interested in AI for Robotics, in particular machine learning and embodied computer vision; large-scale learning of the capacity to perform high-level reasoning from visual observations, and the connections between machine learning and control. He is a member of the directing committee of GDR ISIS and co-leader of it’s topic “Machine Learning”. He has supervised 18 defended PhD theses, is a regular area chair of NeurIPS, ICLR, ICML, CVPR, ICCV and ECCV. From 2005 to 2021 he was associate professor (Maître de Conférences, HDR) at INSA de Lyon and LIRIS, a CNRS laboratory, where he was also the head of the AI chair / chair in Artificial Intelligence (the group). He received his MSc in computer science from TU Vienna, Austria, in 2000, and a PhD in computer science from INSA de Lyon, France, in 2003. In 2012 he obtained the habilitation diploma, also from INSA de Lyon. In the past he was also member of the scientific committee of GDR IA; member of the board of AI experts at the French national supercomputing cluster GENCI, and member of evaluation ANR committee “Artificial Intelligence” from 2019-2021 and ANR committee “Interaction and Robotics” from 2016-2018, and IEEE-Transactions on PAMI area editor (2019-2025).

From conversation to conversational: speech synthesis and the communicative power of the human voice
Prof. Éva Székely
Division of Speech, Music, and Hearing, KTH

Show details

Abstract: Deep-learning-based speech synthesis now allows us to generate voices that are not only natural-sounding but also highly realistic and expressive. This capability presents a paradox for conversational AI: it opens up new possibilities for more fluid, humanlike interaction, yet it also exposes a gap in our understanding of how such expressive features shape communication. Can synthetic speech, which poses these challenges, also help us solve them? In this talk, I explore the fundamental challenges in modelling the spontaneous phenomena that characterise spoken interaction: the timing of breaths, shifts in speech rate, laughter, hesitations, tongue clicks, creaky voice and breathy voice. In striving to make synthetic speech sound realistic, we inevitably generate communicative signals that convey stance, emotion, and identity. Modelling voice as a social signal raises important questions: How does gender presentation in synthetic speech influence perception? How do prosodic patterns affect trust, compliance, or perceived politeness? To address such questions, I will present a methodology that uses controllable conversational TTS not only as a target for optimisation but also as a research tool. By precisely manipulating prosody and vocal identity in synthetic voices, we can isolate their effects on listener judgments and experimentally test sociopragmatic hypotheses. This dual role of TTS – as both the object of improvement and the instrument of inquiry – requires us to rethink evaluation beyond mean opinion scores, towards context-driven and interaction-aware metrics. I will conclude by situating these ideas within the recent paradigm shift toward large-scale multilingual TTS models and Speech LLMs, outlining research directions that help us both understand and design for the communicative power of the human voice.

Bio: Dr. Éva Székely is an Assistant Professor in Speech Technology at KTH Royal Institute of Technology in Stockholm. She works at the intersection of speech technology and speech science, with a focus on developing conversational text-to-speech and studying the perception of synthetic voices. She leads several nationally and foundation-funded research projects on spontaneous and conversational speech modelling, and also pursues work on inclusive speech technologies, including gender-diverse voice design, synthetic voices for assistive communication, and methods for detecting and mitigating bias in speech foundation models. She has published extensively in leading speech technology venues, and her work includes open-sourced methods for prosody evaluation and bias detection. She holds an MSc from Utrecht University in Speech and Language Technology and a PhD from University College Dublin focussed on the topic of expressive speech synthesis in human interaction.

Dr. Bot: Opportunities and Challenges of Social Robots in Healthcare
Exploring Ethics, Technology Acceptance, and Implementation in Real-Life Care Settings
Dr. Maribel Pino
Broca Living Lab, Assistance Publique – Hôpitaux de Paris

Show details

Abstract: This presentation introduces socially assistive robots as emerging partners in healthcare, with a focus on geriatric care and dementia support. It explores their potential to enhance patient engagement, ease caregiver burden, and improve quality of life, while addressing key challenges: defining the most effective, feasible, and acceptable clinical use cases; overcoming functional limitations in real-world environments, physical tasks, and dialogue capacities; and adapting to care setting constraints and staff readiness. Ethical considerations will be central, including privacy, informed consent, autonomy, and fairness—ensuring data and algorithms do not disadvantage vulnerable groups, as highlighted by the EU AI Act. The talk will also review evaluation approaches—usability, acceptability, effectiveness, and clinical impact—to show how interventions using these technologies are assessed. Drawing on research-based examples and real-world care protocols, the session will share implementation successes and lessons learned, offering insights into responsible, inclusive pathways for bringing social robots into future healthcare practice.

Bio: Maribel Pino, PhD, is a cognitive psychologist and Director of the Broca Living Lab (AP-HP/Paris Cité University, Paris)—a hospital-based hub for designing, testing, and implementing technology-based solutions in healthcare. Her research focuses on participatory approaches to designing and evaluating social robots and AI-driven technologies for healthcare. She conducts applied research in real-world care environments—including hospitals, nursing homes, and private homes—examining how these innovations can be effectively implemented and adopted. She also studies the legal, ethical, and policy implications of artificial intelligence to support its responsible integration in healthcare. Her work prioritizes vulnerable populations—including older adults and individuals living with dementia, disabilities, or mental health conditions—ensuring that innovations address clinical realities and promote equity. She collaborates with healthcare providers, industry partners, and patient groups to generate evidence guiding the development and deployment of robust, ethically sound technologies.