Toward an Enhanced Mutual Awareness in Asymmetric CVE

— Collaborative Virtual Environments (CVEs) aim at providing several users with a consistent shared virtual world. In this work, we focus on the lack of mutual awareness that may appear in many situations and we evaluate different ways to present the distant user and his actions in the Virtual Environment (VE) in order to understand his perception and cognitive process. Indeed, an efﬁcient collaboration involves not only the good perception of some objects but their meaning too. This second criterion introduces the concept of distant analysis that could be a great help in improving the understanding of distant activities. For this work, we focus on a common case consisting in estimating accurately the time at which a distant user analyzed the meaning of a remotely pointed object. Thus, we conduct some experiments to evaluate the concept and compare different techniques for implementing this new awareness feature in a CVE. Amongst others, results show that expertise of the users inﬂuences on how they estimate the distant activity and the type of applied strategies.


I. INTRODUCTION
Virtual reality (VR) provides the user with an interactive and immersive virtual world in order to ease interactions and improve performances. To this end, the immersed user needs to understand accurately the consequences of his actions in the VE. This is the goal of awareness techniques that provide feedbacks from the Virtual Environment (VE) to the user.
The extension of a VR system for multiple users (potentially distant) is handled through an immersive 3D Collaborative Virtual Environment (CVE). Thus, CVE derived from the convergence of VR and Computer-Supporter Cooperative Work (CSCW) [15]. The literature leverages different sensory channels and suggests metaphors for presenting the awareness of others' activities as well as their perceptive abilities.
Collaborative awareness techniques, from simple avatars to more sophisticated techniques that can take into account the limitations of a specific user caused, for instance, by physical constraints [7], can be used to enhance the awareness of the others activities. These techniques greatly improve interaction performances, but many situations still suffer from a lack of mutual awareness. Indeed, a crucial need when interacting with other distant people consists in understanding accurately their own perception and comprehension of the interactive VE. This awareness of others' activities as well as their perceptive abilities can be achieved through different awareness techniques. All these methods are largely symmetric in that they synchronously trigger the same feedbacks for each user. Thus, they take place in a continuous awareness loop. We postulate that this loop is not enough, and that an additional asymmetric loop could greatly improve the awareness of the others, in particular in a two-users CVE with asymmetric roles and viewpoints. This could be used in a guiding scenario [12] or any application that involves a user with a global viewpoint collaborating with another immersed user in the CVE.
First, section II presents the state of the art of awareness in CVE and highlights some limitations. Section III introduces our concept of asymmetric loop to improve collaborator's awareness. Then, the experiment is presented in section IV and results in section V. Last, we discuss the results in section VI and propose perspectives in section VII.

II. AWARENESS IN CVE
Different situations require collaborative activities. First, users may symmetrically collaborate in order to perform the manipulation of an object or to achieve a succession of tasks on an industrial machine. Second, they may have different roles and viewpoints, such as in asymmetric guiding for the exploration of a VE or to take advantages of a multi-scale approaches of a VE involving scientific visualization. In any case, collaborators may try to show or explain something to each others. To deal with this challenge, the literature already provides solutions to ease the distant communication and understanding between users. These features are part of the workspace awareness as defined by Gutwin et al. [10].
In the real world, social interactions are interpreted through many crossed modalities. For instance, gaze direction is a substantial cue in terms of understanding what others are currently seeing, and can even help to interpret the current cognitive process of people: Are they thinking? Have they analyzed what I have shown them? In CVE, behavior such as gaze direction [16] and facial expression [13] can be used to simulate these helpful features, but they are rarely available. Thus, metaphorical representations have been proposed for handling some of the missing natural features in collaborative interactions.
Sometimes, collaborators can be only represented by a viewing frustum that informs others about their field of view and viewable objects in a coarser way [8]. Moreover, it is important to be aware of collaborators' interaction abilities in order to not misunderstand a shared situation [7].
Lastly, some additional communication features are necessary to improve collaboration and to allow direct communication between users (who can be remote users). These can be based on verbal, video [14], haptic [4] or visual communication [1].
We note that none of the existing solutions takes into account the awareness of what others currently analyzed regarding a remote informative interaction, such as an object pointing. We think this would be an interesting feature to provide in order to improve collaboration, especially in a nonverbal situation [6]. Indeed, in some cases, verbal communication cannot be used, such as when users speak different languages, when environments are too noisy or when a user is deaf-and-mute. Thus, in this study, we focus on this particular context without any available audio channel. In a collocated (where users share the same physical and virtual environment) or a remote setup, users can only interact in an asynchronous way since they are independent [3]. Even if they co-manipulate a shared object and try to synchronize their motion, they are still two individuals. For example, the main difficulty in extending a bimanual manipulation toward a cooperative manipulation is the lack of the proprioceptive sense of the others [11]. This limitation can explain the meaning of the asynchronous aspect that we introduce (differently from the classical definitions of asynchronous interaction that takes place at a very different moment as in [9]). Indeed, in this sample case, the lack of proprioception generates a desynchronization between collaborators, and thus an asynchronous collaborative interaction (with a very high frequency for time-step). We propose to manage this feature with a new asymmetric awareness loop for collaboration. Especially, we can explain more simply its interest in an asymmetric context as illustrated in Figure 1. This asymmetry can be useful in many scenarios, such as in a guiding task, in order to benefit of different capabilities. In this setting, the guide has a global viewpoint of the scene, and the visitor is immersed in the VE. Here, it can be tricky for the guide to be sure that the visitor saw and analyzed an object he pointed to because: • Their viewpoints are different, • The duration of their analysis is user-dependent Thus, we extend the classification of collaborative awareness with a new item that we call the awareness of collaborators' analysis. Unlike the other ones, this part takes place in an asymmetric interactive loop that can provide awareness features in an asynchronous way, as explained in Figure 2. Using this new loop, we aim to reduce the lack of mutual knowledge, especially concerning what others currently analyzed.
Due to implicit feedbacks of the classical symmetric loop that implements some workspace awareness features, some users could estimate the distant activity. But, some misunderstanding are still possible. Thus, depending on the users' expertise (novices vs. experts), our new asymmetric loop could bring redundant awareness informations. But some novice users who are not able to interpret implicit feedbacks could use it to improve their estimation.

A. Task description
The visitor used an HMD to be immersed in a room containing six boards from among 66 possibilities. These boards presented pictures. Each picture was composed of two figures: The left one was always the same image of reference, whereas the right one could be the rotated image of reference or a different one. The task of the visitor consisted in finding the board pointed to by the guide, and then to answer the question: Are these two figures identical modulo a 2D rotation? This task is called "2D mental rotation" in the psychology literature [5]. This literature proves that everyone can achieve it with few performance differences between people. Thus, this 2D mental rotation task was a reasonably simple task that nevertheless required an analysis process that simulated the one described in our new asymmetric loop. The guide's task (i.e. the subject's task) consisted in estimating the moment when the visitor had analyzed the board and answered. With this task, we wanted to evaluate the estimation performances of the guide regarding the visitor's activity and the analysis process.

B. Independent variables
To evaluate the impact of the analysis feedback compared with classical awareness techniques, we compared different conditions leveraging different feedback types of the visitor's activity for the guide's awareness.
Estimation features. They are usually used in the symmetric awareness loop. In our experiment, the visitor was always presented as an avatar with a frustum representing his exact field of view (¬⊗). Feature ⊗ added a squared spotlight matching the frustum dimensions and highlighting the objects in his field of view (see Figure 1).
Safe-explicit feature. It implements an analysis feedback from the visitor to the guide. In our experiment, an arrow ( ) was presented to the guide as soon as the visitor had completed his task (cf. Figure 3). We assumed that, for the visitor, the time between the decision following the analysis process and the answer recorded by the system was insignificant (less than 100ms, as observed in [3]).
Single and successive tasks. A single task did not cause a perceptible behavior of the distant user when he completed his analysis, while a successive task brought about implicit feedback due to a specific behavior of the distant user when he finished his analysis. In our experimental process, the most obvious implicit indication was the motion of the visitor used to inform the guide about his analysis process. For example, if the visitor looked at the board indicated, then looked elsewhere, it could suggest that he had completed the analysis. To simulate this behavior, we gave instructions to the recorded visitor. In one case, called single task ( †), we asked him to keep looking at the board after he completed the task until the next iteration. In another case, called successive task ( ‡), we asked him to return directly to a phase of VE exploration after solving the task ( ‡).
Combining the three variables of the experiments leaded to eight conditions. Every subject met all the conditions.

C. Experimental protocol
We designed the experiment in order to be able to evaluate the guide's perception of the distant visitor's activity. Thus, if we used a real human visitor for each trial, it could have skewed our experimental results due to heterogeneous behaviors. To solve this issue, we recorded one real human visitor achieving the tasks in a pre-process phase. Therefore we were able to replay human behaviors in a counterbalancing design for each subject of the experiment without any bias regarding the visitor's behavior.
Afterwards, each participant was first informed about the complete study proceedings. Then they completed an identification questionnaire allowing us to collect general information about their experience with VR. Next, a demonstration introduced the experiment and gave instructions about the objective for 10min. The experimental manipulation was composed of 56 iterations (8 conditions * 7 iterations) and took 10min.
Each condition iterated on seven boards from among the 66 available ones (using a counterbalancing order). We ensured each board was pointed to at least once for each condition. One iteration was decomposed as follows: 1) The system simulated a distant pointing of the guide: The symmetric awareness technique effects board flicking; 2) The system started to replay the recorded visitor's behavior sequence; 3) The visitor searched for the board indicated; 4) The visitor found it and watched it; 5) The visitor solved the task by analyzing the mental rotation; 6) Only in conditions, an analysis feedback was send to the guide meaning that the visitor had completed the analysis of the board pointed to.
Moreover, we ensured that the visitor always completed his task before a 10s timeout (actually he usually completed it in under 5s). During an iteration, the subject's task consisted in clicking when he estimated that the visitor had completed the D R A F T board analysis. After each validation, we asked the subject the chosen strategy for estimating when the visitor had completed the analysis of the indicated board. He could choose between five options: No specific strategy, self mental rotation, time count, visitor motion, other: Type any strategy a .
We briefly explained the meaning of each proposal in the trial phase. For each condition change, a black screen displayed the state of the three independent variables for the next conditions. Three icons always remained visible to the subjects to remind them of the current independent variables.
At the end, participants were asked to fill in a questionnaire to provide their impressions, comments and subjective judgments about the estimation features, the safe-explicit feature and the tasks with: • A dichotomous preference test between the estimation features (¬⊗ /⊗), the safe-explicit feature ( /¬ ) and the tasks ( † / ‡); • A Likert-scale (1: Not certain at all, 4: Totally certain) with the following questions about the estimation features, the safe-explicit feature and the tasks: -"Are you confident that the visitor has seen the board indicated?" -"Are you confident that the visitor has completed the analysis of the board?" D. Measurements and hypotheses 1) Measurements: We measured the delta time between the answer of the visitor and the estimation of the guide. Thus, we compared the estimation accuracy of the analysis process of the visitor by the guide according to each condition. The goal was to improve this estimation and thus the narrower the gap between collaborators' validation, the better the guide's estimation.
It should be noted that, in order to remove possible bias, we analyzed only data that respect the following conditions: The guide validated before the 10s timeout and the response to the task was 'yes'. b 2) Hypotheses: We posit some hypotheses: • H1: In successive task ( ‡), the additional estimation feature (⊗) improves the estimation accuracy. • H2: In single task ( †), the safe-explicit feedback ( ) improves the estimation accuracy.

E. Participants
Our panel was composed of 20 subjects aged from 23 to 54 (M = 32, SD = 8). There were 12 males and 8 females. Eight of them were considered as experts and 12 as novices. This a With this text field, subjects could specify any strategy they used. We did not explicitly propose the apparition of the arrow as strategy, in order not to influence the subjects' answer.
b Some boards (∼ 10%) still presented different figures in order to keep the recorded visitor focused on his task. consideration was based on their personal experience in VR and, more broadly, in 3D and/or collaborative video games. We considered a subject as expert if he had spent more than an hour per week during the last weeks, and already spent more than 14 hours per week during their lifetime using these kinds of applications. Our subjects had various backgrounds: PhD students, R&D engineers, communication and human resources staff, managers and assistants.

V. RESULTS
We present the results from our analysis as below: Note that a comparison between answers to a short initial and a final fatigue questionnaire revealed no significant differences between the level of subjects' fatigue before and after the experiment.

A. Time to validate
We analyzed the delta time between both validations (visitor and guide). If this difference is positive, that means that the guide validated after the visitor. If the difference is negative, that means that the guide validated before the visitor. This was a main result: Some experts validated before the visitor. Subjects had to validate when they were convinced that the visitor had analyzed the board. When the participants validated before the visitor, we considered that they failed.
In the case of safe-explicit feature ( ), an arrow appeared when the visitor just completed his current task. The mode ( † or ‡, or ¬ , ⊗ or ¬⊗) was presented at the screen. So, the subject knew if an arrow would appear or not. Rationally, in the case of safe-explicit feature ( ), subjects had just to wait the arrow appeared to validate. In fact, some of them validated before. We describe this in subsection V-F.

B. Time to validate after the visitor
We analyzed the cases of subjects' success, i.e. when the guide took the mental decision after the visitor. As action's duration to click was assumed to be less than 100ms, as observed in [3], we assumed the mental decision was the time to click minus 100ms.   Results show that experts are better than novices to evaluate the right time the visitor achieved the mental rotation analysis.

C. Estimation & safe-explicit features when success
The analysis of the additional estimation feature results (⊗ vs ¬⊗) gave no significant differences. The spotlight was not a useful additional estimation feature in this study. As the results were not significant, we do not analyze the effect of this feature (⊗) in the following. Moreover, H1 is not validated and we will discuss this result in section VI.
At the contrary, the safe-explicit feature results ( vs ¬ ) gave significant differences for experts (F(1, 148) = 5.34, p < .05) and for novices (F(1, 411) = 4.304, p < .05). The safe-explicit feature that implemented an analysis feedback from the visitor to the guide improves the estimation accuracy, as stated in H2 for single task ( †).
In our experiment, the safe-explicit feature was represented by an arrow above the board for the guide indicating that the visitor had just completed his task (cf. Figure 3). Even when the participants had to validate as soon as they saw the arrow, the experts were better. This result could be explained by the skills of experts in VE.

D. Single versus successive task
In the single task ( †), the analysis of the time to validate gave not any significant difference between experts and novices, even if experts (M = 0.7, SD = 0.09) were quicker than novices (M = 0.8, SD = 0.07). When the visitor were motionless after the completion of his task ( †), experts and novices met the same difficulties to evaluate the right time to validate. Obviously, in this case of uncertainty, the safe-explicit feature helped the subjects to decide (F(1, 250) = 4.6, p < .05). We could appreciate how the safe-explicit feature guided the decision-making process in two ways:
In the successive task ( ‡), the analysis of the time to validate gave significant difference between experts and novices (F(1, 327)  The visitor's behavior was an implicit feedback; as the experts validated earlier, experts were better than novices to understand this feedback.

E. Strategies applied
The strategies applied depended on the tasks ( † / ‡), the situations ( / ¬ ) and the groups (novices/experts) (cf. Figure 5). In the single task, for each situation ( / ¬ ), novices applied the visitor motion strategy more often than experts. For each situation, experts applied time count and mental rotation strategies more often than novices. In the successive task, the most common strategy applied was the visitor motion strategy. Without safe-explicit feature, experts applied the visitor motion strategy more often than experts. Experts applied time count strategy more often than novices. And novices applied mental rotation strategy more often than experts. Figure 6 illustrates the success of strategies for validation after the visitor validation. For each strategy, we calculated the ratio of times the strategy was applied with success. With a ratio less than 50%, the guide validated more often before than after the visitor. When the subjects validated before the visitor, we considered that they applied an unsuccessful strategy to estimate the right time to validate. Figure 6 shows that the less successful strategies were the mental rotation (47%) and time count (53%) strategies and the most successful strategies were visitor motion (62%) and safe-explicit feature (100%) strategies.

F. Successful strategies
In the mental rotation strategy, the guide realized a mental rotation, with the implicit assumption that the visitor took the same time to analyze the board. In the time count strategy, the evaluation was quite more approximated: The guide did not truly analyzed the board but evaluated the complexity of the rotation and adjusted the time that the visitor took to complete it. These two strategies were based on how the subject analyzed himself the board, not on the visitor's behavior or on the safe-explicit feature. These two strategies illustrated an egocentric perspective, which were frequently unsuccessful in this study. One of the reasons to explain this result could be the difference of viewpoint between the visitor and the guide (cf. Figure 1). The guide had a global viewpoint that allowed to watch the 7 boards. The visitor had an immersive viewpoint with a narrow visual angle that could contain only 3 boards at the time. The guide could see the board pointed earlier than the visitor, thus he could validate before the visitor by applying egocentric strategies.
Visitor motion and safe-explicit feature strategies illustrated an allocentric perspective. The guide was waiting for a visitor's information: A punctual successive behavior or a safe-explicit feedback. The visitor motion strategy was often successful in the successive task, less in single task. In the successive task, the visitor looked at the board to analyze it, answered and then moved. Thus, his behavior was useful to detect when the analysis was completed. One would have thought that the success score would be higher. The participants told they met difficulties to detect the fixation when the analysis process duration was too short. In this condition, the visitor explored the VE to look for the pointed board, found the board, analyzed it, answered and returned exploring the VE. When the analysis process was short, the guide could perceive the visitor's behavior as continuous. In the single task, the visitor watched the board to analyze it and stayed on it after   answering. So, his behavior was not useful to detect when the analysis was completed. This behavior was just useful to detect when the fixation began. Then the guide had to evaluate the time to validate after this fixation. Often they validated too early. Figure 6 illustrates this difficulty. The safe-explicit feature was a very successful strategy. Obviously, a validation after the arrow indication was always successful.

G. ROI of strategies applied
We calculated a ROI (Return On Investment) by multiplying the frequency F of the strategy i applied (0 Fi 1) by the score of success of this strategy S (0 Si 1) and by adding these results obtained for the four main strategies (0 < i 4). ROI = ∑ i (F i * S i ) with ∑ i Fi 1 and 0 S i 1. Thus, 0 ROI 1. The better was the ROI, the better was the global success of the strategies chosen by each group of participants. Table 1 summarizes the most applied and successful strategies, and table 2 gives the associated ROIs scores.
The analysis of the ROIs proved that experts applied strategies that fit better than novices. This analysis proved that the highest ROIs in successive task were due to the visitor motion and the safe-explicit feature strategies. The visitor's behavior was an implicit feedback that worked. We proved that as the experts validated earlier, they were better than novices to use this feedback. The safe-explicit feature was truly useful when the visitor's behavior did not clarify when the analysis had been achieved. Experts used more often than novices this feature to validate. H. Questionnaire results a) Question: "Are you confident that the visitor has seen the board indicated?": Globally, all the participants were very confident that the visitor had seen the board indicated (> 82%) (a little less for novices in the successive task: 67%). The estimation feature with the spotlight was more appreciated, particularly by novices. The participants preferred the single task to be firmly convinced the visitor had seen the board. To optimize confidence that the visitor has seen the board, the best configuration was: Spotlight (⊗) and single task ( †). b) Question: "Are you confident that the visitor has completed the analysis of the board?": Obviously, with the safeexplicit feature, the participants thought with a higher degree of confidence that the visitor had completed the analysis of the board. The participants preferred successive task to be firmly convinced the visitor had completed his analysis. "If the visitor moves after steadying, that means he completed the task, so I validate" said numerous participants. To optimize confidence that the visitor had achieved his analysis, the best configuration was: Safe-explicit feature ( ) and successive task ( ‡).

VI. DISCUSSION
The proposed additional estimation feature, implemented as a spotlight matching the frustum dimensions, did not prove its usefulness (i.e we did not validate H1). An explanation could be that the boards were distributed on a horizontal halfcircle around the visitor. Thus, the rotation of his head was almost around only one axis. This partially decreases the 3D aspect of the task, and could be resolved using a half-sphere to dispose the boards. The head would rotate around the three axis, and it would be harder to estimate his field of view without the spotlight. This would be a more realistic use case for CVE, in addition to provide more interactivity, and would also be interesting to generalize the new asynchronous loop we propose to enhance awareness in CVE. Thus, we plan to extend this work in more complex scenarios while improving limits of this experiment.
Moreover, experts and novices did not apply the same strategies to interpret awareness features of the distant activity. Thus, future work should take into account the users' expertise D R A F T to enhance awareness and improve performances by adapting the interface.

VII. CONCLUSION AND PERSPECTIVES
In this paper, we focus on the awareness of collaborator's activities. More specifically, we propose to add an asynchronous awareness loop in an asymmetric CVE (in terms of roles and viewpoints) in order to propose a new way to be aware of what the others currently analyzed in the CVE.
We ran a user study in a CVE involving a simple cognitive task called 2D mental rotation [5]. We analyzed the strategies collected from the participants to estimate a time when a prerecorded collaborator, called the visitor, achieved his analysis of a pointed board. Mental rotation and time count strategies illustrated an egocentric perspective. They were frequently unsuccessful in this study, i.e. by applying these strategies, the participants validated before the visitor. One of the reasons to explain this result could be the asymmetric setup in terms of viewpoint. If the guide (i.e. the subject), with a global viewpoint, applied egocentric strategies to analyze the pointed board, their validations were earlier than the visitor's validation due to his immersive viewpoint.
In the opposite, visitor motion and safe-explicit feature strategies illustrated an allocentric perspective. The guide better took into account the visitor's activity, which led to an enhanced estimation accuracy. In this study, the guide could use implicit feedback such as the visitor motion to estimate the completion of the visitor's analysis process. Moreover, in some conditions, we provided a safe-explicit feature that enabled the guide to know the exact moment the visitor completed his analysis. This last has proved to be very useful for experts and novices that used it successfully in most cases.
We also differentiated two groups of users. Experts and novices applied strategies to estimate the time to validate: • Taking into account the visitor motion; • Waiting for a safe-explicit feedback; • Achieving the same analysis as the visitor had to achieve (2D mental rotation); • Counting mentally a time (few seconds) depending on the analysis complexity perceived by the guide.
But, experts and novices did not succeed in the same manner, because experts chose their strategies in a better way than novices. Experts were able to select their strategies in function of the condition better than novices. For example, in single task and with an available safe-explicit feature, novices applied visitor motion strategy whereas experts applied the most successful strategy: the safe-explicit feature. An interpretation can be found in the following definition: "It is human perception and experience of events that generate awareness" [2]. Thus, even if we provide adequate awareness features, their interpretation stay user-dependent (especially according to the expertise).
This study has focused on the analysis process that is a first step in the human cognitive system. It is followed by the understanding that can be right or wrong. This aspect seems to be an interesting feature to investigate and could pave the way toward the awareness of distant collaborators' understanding in order to further improve remote collaboration in CVE.