header

Publications


 

Do Prosody and Embodiment Influence the Perceived Naturalness of Conversational Agents' Speech?


Jonathan Ehret, Andrea Bönsch, Lukas Aspöck, Christine T. Röhr, Stefan Baumann, Martine Grice, Janina Fels, Torsten Wolfgang Kuhlen
Transactions on Applied Perception (TAP)
presented at ACM Symposium on Applied Perception (SAP)
pubimg

For conversational agents’ speech, all possible sentences have to be either prerecorded by voice actors or the required utterances can be synthesized. While synthesizing speech is more flexible and economic in production, it also potentially reduces the perceived naturalness of the agents amongst others due to mistakes at various linguistic levels. In our paper, we are interested in the impact of adequate and inadequate prosody, here particularly in terms of accent placement, on the perceived naturalness and aliveness of the agents. We compare (i) inadequate prosody, as generated by off-the-shelf text-to-speech (TTS) engines with synthetic output, (ii) the same inadequate prosody imitated by trained human speakers and (iii) adequate prosody produced by those speakers. The speech was presented either as audio-only or by embodied, anthropomorphic agents, to investigate the potential masking effect by a simultaneous visual representation of those virtual agents. To this end, we conducted an online study with 40 participants listening to four different dialogues each presented in the three Speech levels and the two Embodiment levels. Results confirmed that adequate prosody in human speech is perceived as more natural (and the agents are perceived as more alive) than inadequate prosody in both human (ii) and synthetic speech (i). Thus, it is not sufficient to just use a human voice for an agent’s speech to be perceived as natural - it is decisive whether the prosodic realisation is adequate or not. Furthermore, and surprisingly, we found no masking effect by speaker embodiment, since neither a human voice with inadequate prosody nor a synthetic voice was judged as more natural, when a virtual agent was visible compared to the audio-only condition. On the contrary, the human voice was even judged as less “alive” when accompanied by a virtual agent. In sum, our results emphasize on the one hand the importance of adequate prosody for perceived naturalness, especially in terms of accents being placed on important words in the phrase, while showing on the other hand that the embodiment of virtual agents plays a minor role in naturalness ratings of voices.

» Show Videos
» Show BibTeX

@article{Ehret2021a,
author = {Ehret, Jonathan and B\"{o}nsch, Andrea and Asp\"{o}ck, Lukas and R\"{o}hr, Christine T. and Baumann, Stefan and Grice, Martine and Fels, Janina and Kuhlen, Torsten W.},
title = {Do Prosody and Embodiment Influence the Perceived Naturalness of Conversational Agents’ Speech?},
journal = {ACM transactions on applied perception},
year = {2021},
issue_date = {October 2021},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
volume = {18},
number = {4},
articleno = {21},
issn = {1544-3558},
url = {https://doi.org/10.1145/3486580},
doi = {10.1145/3486580},
numpages = {15},
keywords = {speech, audio, accentuation, prosody, text-to-speech, Embodied conversational agents (ECAs), virtual acoustics, embodiment}
}





Being Guided or Having Exploratory Freedom: User Preferences of a Virtual Agent’s Behavior in a Museum


Andrea Bönsch, David Hashem, Jonathan Ehret, Torsten Wolfgang Kuhlen
21th ACM International Conference on Intelligent Virtual Agents 2021 (IVA'21)
pubimg

A virtual guide in an immersive virtual environment allows users a structured experience without missing critical information. However, although being in an interactive medium, the user is only a passive listener, while the embodied conversational agent (ECA) fulfills the active roles of wayfinding and conveying knowledge. Thus, we investigated for the use case of a virtual museum, whether users prefer a virtual guide or a free exploration accompanied by an ECA who imparts the same information compared to the guide. Results of a small within-subjects study with a head-mounted display are given and discussed, resulting in the idea of combining benefits of both conditions for a higher user acceptance. Furthermore, the study indicated the feasibility of the carefully designed scene and ECA’s appearance.

We also submitted a GALA video entitled "An Introduction to the World of Internet Memes by Curator Kate: Guiding or Accompanying Visitors?" by D. Hashem, A. Bönsch, J. Ehret, and T.W. Kuhlen, showcasing our application.
IVA 2021 GALA Audience Award!

» Show Videos
» Show BibTeX

@inproceedings{Boensch2021b,
author = {B\"{o}nsch, Andrea and Hashem, David and Ehret, Jonathan and Kuhlen, Torsten W.},
title = {{Being Guided or Having Exploratory Freedom: User Preferences of a Virtual Agent's Behavior in a Museum}},
year = {2021},
isbn = {9781450386197},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3472306.3478339},
doi = {10.1145/3472306.3478339},
booktitle = {{Proceedings of the 21th ACM International Conference on Intelligent Virtual Agents}},
pages = {33–40},
numpages = {8},
keywords = {virtual agents, enjoyment, guiding, virtual reality, free exploration, museum, embodied conversational agents},
location = {Virtual Event, Japan},
series = {IVA '21}
}





Compression and Rendering of Textured Point Clouds via Sparse Coding


Kersten Schuster, Philip Trettner, Patric Schmitz, Julian Schakib, Leif Kobbelt
High-Performance Graphics 2021
pubimg

Splat-based rendering techniques produce highly realistic renderings from 3D scan data without prior mesh generation. Mapping high-resolution photographs to the splat primitives enables detailed reproduction of surface appearance. However, in many cases these massive datasets do not fit into GPU memory. In this paper, we present a compression and rendering method that is designed for large textured point cloud datasets. Our goal is to achieve compression ratios that outperform generic texture compression algorithms, while still retaining the ability to efficiently render without prior decompression. To achieve this, we resample the input textures by projecting them onto the splats and create a fixed-size representation that can be approximated by a sparse dictionary coding scheme. Each splat has a variable number of codeword indices and associated weights, which define the final texture as a linear combination during rendering. For further reduction of the memory footprint, we compress geometric attributes by careful clustering and quantization of local neighborhoods. Our approach reduces the memory requirements of textured point clouds by one order of magnitude, while retaining the possibility to efficiently render the compressed data.




Design and Evaluation of a Free-Hand VR-based Authoring Environment for Automated Vehicle Testing


Sevinc Eroglu, Frederic Stefan, Alain Chevalier, Daniel Roettger, Daniel Zielasko, Torsten Wolfgang Kuhlen, Benjamin Weyers
IEEE Conference on Virtual Reality and 3D User Interfaces 2021
pubimg

Virtual Reality is increasingly used for safe evaluation and validation of autonomous vehicles by automotive engineers. However, the design and creation of virtual testing environments is a cumbersome process. Engineers are bound to utilize desktop-based authoring tools, and a high level of expertise is necessary. By performing scene authoring entirely inside VR, faster design iterations become possible. To this end, we propose a VR authoring environment that enables engineers to design road networks and traffic scenarios for automated vehicle testing based on free-hand interaction. We present a 3D interaction technique for the efficient placement and selection of virtual objects that is employed on a 2D panel. We conducted a comparative user study in which our interaction technique outperformed existing approaches regarding precision and task completion time. Furthermore, we demonstrate the effectiveness of the system by a qualitative user study with domain experts.

Nominated for the Best Paper Award.

» Show Videos



Poster: Indircet User Guidance by Pedestrians in Virtual Environments


Andrea Bönsch, Katharina Güths, Jonathan Ehret, Torsten Wolfgang Kuhlen
ICAT-EGVE 2021 - International Conference on Artificial Reality and Telexistence and Eurographics Symposium on Virtual Environments
pubimg

Scene exploration allows users to acquire scene knowledge on entering an unknown virtual environment. To support users in this endeavor, aided wayfinding strategies intentionally influence the user’s wayfinding decisions through, e.g., signs or virtual guides.

Our focus, however, is an unaided wayfinding strategy, in which we use virtual pedestrians as social cues to indirectly and subtly guide users through virtual environments during scene exploration. We shortly outline the required pedestrians’ behavior and results of a first feasibility study indicating the potential of the general approach.

» Show Videos
» Show BibTeX

@inproceedings {Boensch2021a,
booktitle = {ICAT-EGVE 2021 - International Conference on Artificial Reality and Telexistence and Eurographics Symposium on Virtual Environments - Posters and Demos},
editor = {Maiero, Jens and Weier, Martin and Zielasko, Daniel},
title = {{Indirect User Guidance by Pedestrians in Virtual Environments}},
author = {Bönsch, Andrea and Güths, Katharina and Ehret, Jonathan and Kuhlen, Torsten W.},
year = {2021},
publisher = {The Eurographics Association},
ISSN = {1727-530X},
ISBN = {978-3-03868-159-5},
DOI = {10.2312/egve.20211336}
}





Poster: Virtual Optical Bench: A VR Learning Tool For Optical Design


Sebastian Pape, Martin Bellgardt, David Gilbert, Georg König, Torsten Wolfgang Kuhlen
IEEE Conference on Virtual Reality and 3D User Interfaces 2021
pubimg

The design of optical lens assemblies is a difficult process that requires lots of expertise. The teaching of this process today is done on physical optical benches, which are often too expensive for students to purchase. One way of circumventing these costs is to use software to simulate the optical bench. This work presents a virtual optical bench, which leverages real-time ray tracing in combination with VR rendering to create a teaching tool which creates a repeatable, non-hazardous, and feature-rich learning environment. The resulting application was evaluated in an expert review with 6 optical engineers.

» Show Videos
» Show BibTeX

@INPROCEEDINGS{Pape2021,
author = {Pape, Sebastian and Bellgardt, Martin and Gilbert, David and König, Georg and Kuhlen, Torsten W.},
booktitle = {2021 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW)},
title = {Virtual Optical Bench: A VR learning tool for optical design},
year = {2021},
volume ={},
number = {},
pages = {635-636},
doi = {10.1109/VRW52623.2021.00200}
}





Poster: Prosodic and Visual Naturalness of Dialogs Presented by Conversational Virtual Agents


Lukas Aspöck, Jonathan Ehret, Stefan Baumann, Andrea Bönsch, Christine T. Röhr, Martine Grice, Torsten Wolfgang Kuhlen, Janina Fels
DAGA 2021 - 47. Jahrestagung für Akustik

Conversational virtual agents, with and without visual representation, are becoming more present in our daily life, e.g. as intelligent virtual assistants on smart devices. To investigate the naturalness of both the speech and the nonverbal behavior of embodied conversational agents (ECAs), an interdisciplinary research group was initiated, consisting of phoneticians, computer scientists, and acoustic engineers. For a web-based pilot experiment, simple dialogs between a male and a female speaker were created, with three prosodic conditions. For condition 1, the dialog was created synthetically using a text-to-speech engine. In the other two prosodic conditions (2,3) human speakers were recorded with 2) the erroneous accentuation of the text-to-speech synthesis of condition 1, and 3) with a natural accentuation. Face tracking data of the recorded speakers was additionally obtained and applied as input data for the facial animation of the ECAs. Based on the recorded data, auralizations in a virtual acoustic environment were generated and presented as binaural signals to the participants either in combination with the visual representation of the ECAs as short videos or without any visual feedback. A preliminary evaluation of the participants’ responses to questions related to naturalness, presence, and preference is presented in this work.

» Show BibTeX

@inproceedings{Aspoeck2021,
author = {Asp\"{o}ck, Lukas and Ehret, Jonathan and Baumann, Stefan and B\"{o}nsch, Andrea and R\"{o}hr, Christine T. and Grice, Martine and Kuhlen, Torsten W. and Fels, Janina},
title = {Prosodic and Visual Naturalness of Dialogs Presented by Conversational Virtual Agents},
year = {2021},
note = {Hybride Konferenz},
month = {Aug},
date = {2021-08-15},
organization = {47. Jahrestagung für Akustik, Wien (Austria), 15 Aug 2021 - 18 Aug 2021},
url = {https://vr.rwth-aachen.de/publication/02207/}
}





Virtual Reality and Mixed Reality


Patrick Bourdot, Mariano Raya, Pablo Figueroa, Victoria Interrante, Torsten Wolfgang Kuhlen, Dirk Reiners
18th EuroXR International Conference, EuroXR 2021, Milan, Italy, November 24–26, 2021, Proceedings
pubimg

We are pleased to present in this LNCS volume the scientific proceedings of EuroXR 2021, the 18th EuroXR International Conference, organized by CNR-STIIMA, Italy, which took place during November 24–26, 2021. Due to the COVID-19 pandemic, EuroXR 2021 was held as a virtual conference to guarantee the best audience while maintaining the safest conditions for the attendees. This conference follows a series of successful international conferences initiated in 2004 by the INTUITION Network of Excellence in Virtual and Augmented Reality, supported by the European Commission until 2008. Embedded within the Joint Virtual Reality Conference (JVRC) from 2009 to 2013, it was known as the EuroVR International Conference from 2014 and until last year. The focus of these conferences is to present, each year, novel Virtual Reality (VR) through to Mixed Reality (MR) technologies, also named eXtended Reality (XR), including software systems, immersive rendering technologies, 3D user interfaces, and applications. These conferences aim to foster European engagement between industry, academia, and the public sector, to promote the development and deployment of XR in new and emerging, but also existing, fields. Since 2017, EuroXR (https://www.euroxr-association.org/) has collaborated with Springer to publish the papers of the scientific track of our annual conference. To increase the excellence of this applied research conference, which is basically oriented toward new uses of XR technologies, we established a set of committees including Scientific Program chairs leading an International Program Committee (IPC) made up of international experts in the field. Eight scientific full papers have been selected to be published in the proceedings of EuroXR 2021, presenting original and unpublished papers documenting new XR research contributions, practice and experience, or novel applications. Five long papers and three medium papers were selected from 22 submissions, resulting in an acceptance rate of 36%. Within a double-blind peer reviewing process, three members of the IPC with the help of some external expert reviewers evaluated each submission. From the review reports of the IPC, the Scientific Program chairs took the final decisions. The selected scientific papers are organized in this LNCS volume according to four topical parts: Perception and Cognition, Interactive Techniques, Tracking and Rendering, and Use Case and User Study. Moreover, with the agreement of Springer and for the third year, the last part of this LNCS volume gathers scientific poster/short papers, presenting work in progress or other scientific contributions, such as ideas for unimplemented and/or unusual systems. Within another double-blind peer reviewing process based on two review reports from IPC members for each submission, the Scientific Program chairs selected four scientific poster/short papers from nine submissions (an acceptance rate of 44%). Along with the scientific track, presenting advanced research works (scientific full papers) or research works in progress (scientific poster/short papers) in this LNCS volume, several keynote speakers were invited to EuroXR 2021. Additionally, an application track, subdivided into talk, poster, and demo sessions, was organized for participants to report on the current use of XR technologies in multiple fields. We would like to thank the IPC members and external reviewers for their insightful reviews, which ensured the high quality of the papers selected for the scientific track of EuroXR 2021. Furthermore, we would like to thank the Application chairs, the Demo and Exhibition chairs, and the local organizers of EuroXR 2021. We are also especially grateful to Anna Kramer (Assistant Editor, Computer Science Editorial, Springer) and Volha Shaparava (Springer OCS Support) for their support and advice during the preparation of this LNCS volume.

September 2021

Patrick Bourdot Mariano Alcañiz Raya Pablo Figueroa Victoria Interrante Torsten W. Kuhlen Dirk Reiners




Talk: Numerical Analysis of Keratin Networks in Selected Cell Types


Reinhard Windoffer, Nicole Schwarz, Sungjun Yoon, Teodora Piskova, Michael Scholkemper, Michael Thomas Schaub, Michael Anhuth, Andrea Bönsch, Till Petersen-Krauß, Johannes Stegmaier, Jacopo Di Russo, Rudolf E. Leube
Kármán Conference: European Meeting on Intermediate Filaments
pubimg

Keratin intermediate filaments make up the main intracellular cytoskeletal network of epithelia and provide, together with their associated desmosomal cell-cell adhesions, mechanical resilience. Remarkable differences in keratin network topology have been noted in different epithelial cell types ranging from a well-defined subapical network in enterocytes to pancytoplasmic networks in keratinocytes. In addition, functional states and biophysical, biochemical, and microbial stress have been shown to affect network organization. To gain insight into the importance of network topology for cellular function and resilience, quantification of 3D keratin network topology is needed.

We used Airyscan superresolution microscopy to record image stacks with an x/y resolution of 120 nm and axial resolution of 350 nm in canine kidney-derived MDCK cells, human epidermal keratinocytes, and murine retinal pigment epithelium (RPE) cells. Established segmentation algorithms (TSOAX) were implemented in combination with additional analysis tools to create a numerical representation of the keratin network topology in the different cell types. The resulting representation contains the XYZ position of all filament segment vertices together with data on filament thickness and information on the connecting nodes. This allows the statistical analysis of network parameters such as length, density, orientation, and mesh size. Furthermore, the network can be rendered in standard 3D software, which makes it accessible at hitherto unattained quality in 3D. Comparison of the three analyzed cell types reveals significant numerical differences in various parameters.



Listening to, and remembering conversations between two talkers: Cognitive research using embodied conversational agents in audiovisual virtual environments


Janina Fels, Cosima A. Ermert, Jonathan Ehret, Chinthusa Mohanathasan, Andrea Bönsch, Torsten Wolfgang Kuhlen, Sabine Janina Schlittmeier
DAGA 2021 - 47. Jahrestagung für Akustik
Fortschritte der Akustik - DAGA 2021
Herausgeberin: Deutsche Gesellschaft für Akustik e.V. (DEGA), Berlin, 2021
Wissenschaftliche Edition: Holger Waubke und Peter Balazs
ISBN: 978-3-939296-18-8
Online-Publikation, Zugangsdaten auf Anfrage bei tagungen@dega-akustik.de
pubimg

In the AUDICTIVE project about listening to, and remembering the content of conversations between two talkers we aim to investigate the combined effects of potentially performance-relevant but scarcely addressed audiovisual cues on memory and comprehension for running speech. Our overarching methodological approach is to develop an audiovisual Virtual Reality testing environment that includes embodied Virtual Agents (VAs). This testing environment will be used in a series of experiments to research the basic aspects of audiovisual cognitive performance in a close(r)-to-real-life setting. We aim to provide insights into the contribution of acoustical and visual cues on the cognitive performance, user experience, and presence as well as quality and vibrancy of VR applications, especially those with a social interaction focus. We will study the effects of variations in the audiovisual ’realism’ of virtual environments on memory and comprehension of multi-talker conversations and investigate how fidelity characteristics in audiovisual virtual environments contribute to the realism and liveliness of social VR scenarios with embodied VAs. Additionally, we will study the suitability of text memory, comprehension measures, and subjective judgments to assess the quality of experience of a VR environment. First steps of the project with respect to the general idea of AUDICTIVE are presented.

» Show BibTeX

@ inproceedings {Fels2021,
author = {Fels, Janina and Ermert, Cosima A. and Ehret, Jonathan and Mohanathasan, Chinthusa and B\"{o}nsch, Andrea and Kuhlen, Torsten W. and Schlittmeier, Sabine J.},
title = {Listening to, and Remembering Conversations between Two Talkers: Cognitive Research using Embodied Conversational Agents in Audiovisual Virtual Environments},
address = {Berlin},
publisher = {Deutsche Gesellschaft für Akustik e.V. (DEGA)},
pages = {1328-1331},
year = {2021},
booktitle = {[Fortschritte der Akustik - DAGA 2021, DAGA 2021, 2021-08-15 - 2021-08-18, Wien, Austria]},
month = {Aug},
date = {2021-08-15},
organization = {47. Jahrestagung für Akustik, Wien (Austria), 15 Aug 2021 - 18 Aug 2021},
url = {https://vr.rwth-aachen.de/publication/02206/}
}





Talk: Speech Source Directivity for Embodied Conversational Agents


Jonathan Ehret, Lukas Aspöck, Andrea Bönsch, Janina Fels, Torsten Wolfgang Kuhlen
DAGA 2021 - 47. Jahrestagung für Akustik
pubimg

Embodied conversational agents (ECAs) are computer-controlled characters who communicate with a human using natural language. Being represented as virtual humans, ECAs are often utilized in domains such as training, therapy, or guided tours while being embedded in an immersive virtual environment. Having plausible speech sound is thereby desirable to improve the overall plausibility of these virtual-reality-based simulations. In an audiovisual VR experiment, we investigated the impact of directional radiation for the produced speech on the perceived naturalism. Furthermore, we examined how directivity filters influence the perceived social presence of participants in interactions with an ECA. Therefor we varied the source directivity between 1) being omnidirectional, 2) featuring the average directionality of a human speaker, and 3) dynamically adapting to the currently produced phonemes. Our results indicate that directionality of speech is noticed and rated as more natural. However, no significant change of perceived naturalness could be found when adding dynamic, phoneme-dependent directivity. Furthermore, no significant differences on social presence were measurable between any of the three conditions.

» Show Videos
» Show BibTeX

Bibtex:
@misc{Ehret2021b,
author = {Ehret, Jonathan and Aspöck, Lukas and B\"{o}nsch, Andrea and Fels, Janina and Kuhlen, Torsten W.},
title = {Speech Source Directivity for Embodied Conversational Agents},
publisher = {IHTA, Institute for Hearing Technology and Acoustics},
year = {2021},
note = {Hybride Konferenz},
month = {Aug},
date = {2021-08-15},
organization = {47. Jahrestagung für Akustik, Wien (Austria), 15 Aug 2021 - 18 Aug 2021},
subtyp = {Video},
url = {https://vr.rwth-aachen.de/publication/02205/}
}






Previous Year (2020)
Disclaimer Home Visual Computing institute RWTH Aachen University