• Works
  • Researches
  • About
  • CV

Suibi Weng

  • Works
  • Researches
  • About
  • CV
 

Figure 2: The bar chart on the left demonstrates the difference in subjective delay score between the mimic and improvise tasks. Similarly, the bar chart on the right indicates the difference in reported delay tolerance between the mimic and improvised tasks.

Abstract

Networked Musical Collaboration requires near-instantaneous network transmission for successful real-time collaboration. We studied the way changes in network latency affect participants’ auditory and visual perception in latency detection, as well as latency tolerance in AR. Twenty-four participants were asked to play a hand drum with a prerecorded remote musician rendered as an avatar in AR at different levels of audio-visual latency. We analyzed the subjective responses of the participants from each session. Results suggest a minimum noticeable delay value between 160 milliseconds (ms) and 320 ms, as well as no upper limit to audio-visual delay tolerance.

Research Questions

1. When collaborating with a remote partner via AR, what is the minimum noticeable delay and threshold of network latency (in the transmission of visuals relative to audio) until which musical interaction is possible and tolerable?

2. How does a musician’s focus shift between auditory and visual information as latency increases?

Figure 2: (A) Response time (B) Co-presence Score (C) Task enjoyment

Method

To answer our research questions, we conducted a 2x8 withinsubjects experiment to simulate remote musical collaboration in AR. We experimentally controlled for two types of remote drumming collaboration: Mimic and Improvise, and eight levels of network latency: 0ms, 20ms, 40ms, 80ms, 160ms, 320ms, 640ms, and 1200ms. We used Balanced Latin Squares across all conditions to reduce potential ordering effects. We first discuss the tasks and apparatus, followed by details about the participants involved and the experimental procedure. In both cases, the speed of the rhythm played by the remote partner was 90 beats per minute. During the “mimicry task”, the remote partner would play a strict 4/4 (4 beats per bar, 4 bars per measure) rhythm, which translates to a gap of 750 ms between subsequent hits of the drum. In the “improvise” task, the remote player would continue playing at 90 beats per minute, but using a more natural, free-form rhythm. 24 Participants used the Nreal Light headset to view their remote partner in AR. Participants were given an electronic drum pad to play along with the avatar. An 18-camera Qualysis motion capture system was used to record Figure 2: The bar chart on the left demonstrates the difference in subjective delay score between the mimic and improvise tasks. Similarly, the bar chart on the right indicates the difference in reported delay tolerance between the mimic and improvised tasks. player movement as animation clips. The drum audio was recorded into the Ableton digital audio workstation software via an external sound card. Once recorded, these motion and audio files were applied to a Humanoid avatar (sourced from the Mixamo character and animation platform2) and rendered via an application for the Nreal Light AR headset 3 using the Unity live development engine (version 2020.2.2f1). This application was presented to participants, with differing offsets between the music and animation (Figure 1).


Results

Our results were derived by assessing a quantitative score between 1 and 7 for two questions (How much delay did you experience between animation and sound?, and How would you rate the tolerability of the delay experienced?). To analyze the data, a repeated measures Analysis of Variance (RM-ANOVA) was conducted on the data. Mauchly’s test did not reveal a violation of sphericity for any of the RM-ANOVA tests. Post-hoc Least Significant Difference (LSD) tests explained any significance among the groups. In this study we aim to define a threshold for both perceiving a minimum noticeable delay and when players can no longer tolerate the amount of delay between a perceived sound and visual animation. We addressed the first research question of defining the minimum noticeable delay by assessing the delay amount scores indicated by the participants after each task. Results indicated that players began significantly noticing delay between 160ms and 320ms (A RM-ANOVA revealed a significant difference among minimum noticeable delay means, F(7, 161)= 9.74, p<0.001, η2 p=0.30, where the delay amount conditions of 0ms, 20ms, 40ms, 80ms, 160ms, were significantly smaller than delay amounts of 320ms, 640ms, 1200ms (see Figure 2, left). Therefore, we hypothesize that the minimum noticeable delay value is higher than 160ms and possibly less than 320ms. In future studies we will investigate perceived delay between 160ms and 320ms to test for a more precise value of minimum noticeable delay and control for musical experience. We address the maximum amount of delay a participant can tolerate while playing drums. We noticed that the tolerance began significantly declining at the same delay amount of 320ms as the reported delay score. This suggests that as soon as players begin noticing delay, the experience begins to degrade. Interestingly, the reported delay tolerance levels are not observed below a neutral score of 4 (participants began to score the tolerance lower at 320ms of delay F(7, 161)= 9.94, p<.001, η2 p=.30, where the delay amount conditions of 0ms, 20ms, 40ms, 80ms, 160ms, were significantly smaller than delay amounts of 320ms, 640ms, 1200ms (see Figure 2, right). We interpret this to mean that a maximum threshold for delay was not found during our test sequence. Because players were mimicking a virtual drummer playing at a regular tempo of 90bpm (750ms between beats), 1200ms of delay causes animations to fall out of sync by almost a full cycle of sequential beats. This poses a limit for understanding an upper threshold for delay tolerance with regularly spaced drum beats because a player will not be able to distinguish the discrepancy in delay between cycles. In summary, for designing collaborative music augmented reality applications, visual adjustments should be made during the minimum noticeable delay range between 160 milliseconds (ms) and 320 ms. Further studies, including those that investigate irregular drum beats, are needed to understand audio-visual delay tolerance upper limits.

 
 

Publications

T. Hopkins, S. C. -C. Weng, R. Vanukuru, E. Wenzel, A. Banic and E. Y. -L. Do, "How Late is Too Late? Effects of Network Latency on Audio-Visual Perception During AR Remote Musical Collaboration," 2022 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW), 2022, pp. 686-687, doi: 10.1109/VRW55335.2022.00194. (Christchurch, New Zealand—March12-16, 2022).

Torin Hopkins, Suibi Che-Chuan Weng, Rishi Vanukuru, Emma Wenzel, Amy Banic, Mark D Gross, Ellen Yi-Luen Do. "Studying the Effects of Network Latency on Audio-Visual Perception During an AR Musical Task," 2022 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), 2022, doi: 10.1109/ISMAR55827.2022.00016. (Singapre, Singapore—17-21 October 2022).

Powered by Squarespace.