Introduction
Virtual Reality is not a new invention in the digital age. Stereoscopy technique has been used to create the illusion of depth and space since two hundred years ago\cite{Crary_2002}. During the development of virtual reality, the VR pioneers have not stopped trying to deliver the more realistic experience. Bringing people to where they will never get the chance to go has become the goal for nearly every VR company one can simply tell that by paying attention to the number of appearances of the phrase "being there" in the press about VR. Such mainstream visions of VR bring new questions into play: Is re-creating the reality the ultimate capability of digital media? Quoting McLuhan's hackneyed aphorism "the medium is the message"\cite{Levine_1964}, the art movement going beyond the realism that liberates people from the sensing restriction and gives people a new perception of the reality could be found in any medium (e.g., the impressionism after the realism in painting) but not yet in VR. Because stereoscopy is still the fundamental basis of any current VR technology (and is going to maintain this way in the foreseeable future), one of the approaches to the unrealism VR could be related to binocular rivalry - the unique epistemic phenomenon occurring in dichoptic presentation (one image is presented to one eye and a very different image is presented to the other) of stereoscopy.
Because our eyes are horizontally separated, each eye has its own perspective of the world, and thus both eyes receive slightly different images. Stereopsis is the perception of depth that is constructed based on the difference between these two retinal images. The brain fuses the left and right images and, from retinal disparity, i.e., the distance between corresponding points in these images, extracts relative depth information. When it comes to the stereoscopic three dimensional (3D) displays, usually a pair of goggles is used to present one image one eye and a very slightly different image to the other, the brain can fuse these two images into a single perception and yield stereopsis. However, to perfectly create the 3D aspects of a virtual scene, the two images presented need to be precisely adjusted according to human's binocular vision, otherwise, our brain will extract the wrong depth information that contradicts to our epistemology \cite{Crick_1992}. When the pair of images is not set up correctly (the asymmetrical stereoscopy), dichoptically viewing those images pairs may produce competition in the form of binocular rivalry: perception alternates between different images presented to each eye\cite{Hohwy_2008}. Although the mechanisms behind binocular rivalry are still debated\cite{Brascamp_2015}, previous research\cite{Blake_1991,Kwanghyun_Lee_2015,GREGORY_1965} has proven that, asymmetrical stereopsis will lead to three different epistemic perception, depending on the different levels of visual stimuli of the images: (1) stereopsis, when the images are still similar enough for the brain to extract the depth information; (2) binocular rivalry, when the two images are too different fuse by the brain and they have similar sensory eye dominances; (3) binocular suppression (only one of the images is seen while the other is hidden) when one image leads to a significant eye dominance \cite{1962} than the other\cite{Blake_1980}.
Stereopsis & Binocular Suppression in Asymmetrical Stereoscopy
Other than the traditional symmetrical stereoscopy that is widely adopted in education and entertainment industry -3D TV, 3D movies, VR etc. - the stereopsis yielded by the asymmetrically viewing process has been researched in many aspects. Such cognitive phenomenon has been used in 3D content streaming where one image is transmitted in full resolution while the other is compressed. The stereopsis will still be achieved and the missing pixels in the compressed image will be filled by the corresponding ones from the full resolution image by the brain when viewing them stereoscopically\cite{Wang_2014}. This technique will preserve the 3D viewing quality while preventing the bandwidth to be doubled in 3D content compared to the 2D streaming. Another applicable situation of the asymmetrical stereopsis is the monocular, transparent (a.k.a "see-thru") head-mounted display that project 2D information to only one transparent goggle of the headset \cite{Laramee_2002}. Binocular opaque HMDs are useful for immersive virtual reality applications while monocular transparent displays are preferred when interacting with the world while looking at the display\cite{Feiner_1997}. Specific potential applications could be found mostly in Augmented Reality field, providing users with additional visual information or visual navigation aids\cite{Ockerman}. However, although the two images are shown to the eye asymmetrically, as long as the brain infuse them together into the single vision, there is no difference between the asymmetrical stereoscopy and the symmetrical one on the cognitive level. Therefore, the storytelling potential remains the same.
Binocular suppression happens when one image is extremely "interesting" to the brain and gets all of its attention in dichoptic viewing. As a result, the visual stimulus from the other image is completely blocked\cite{Wilson_2017}. The stereoscopical watching experience under such situation will be no different than watching a flat image with one eye closed.
Storytelling with Binocular Rivalry
Binocular rivalry is a stage in between stereopsis and binocular suppression dichoptic presentation. When the images are too different to be fused and, at the same time, are both similarly "interesting", the brain will find difficulty in choosing which image it should see and constantly switch focus between them. For example, when an image of a house is presented to one eye and an image of a face to the other, then subjective experience alternates between the house and the face. There have been many empirical studies of binocular rivalry but the data they produce are conflicting and it is very difficult to give them an unequivocal interpretation. A number of proposals have been made but the neurocognitive mechanism that explains this visual effect remains unresolved \cite{Tong_2006,N_K_2008,Sengpiel_2013}. Some research tried to look at this phenomenon from the cognitive perspective, holding the belief that binocular rivalry is an epistemic response from the brain to a seemingly incompatible stimulus condition where two distinct objects occupy the same spatiotemporal location.
In his famous paper\cite{eisenstein1949dialectic}, Eisenstein believes that the nature of the art is the conflict between natural existence and creative tendency, and montage creates such conflict by pictorially placing two immobile images (or extendedly, two shots) next to each other. Eisenstein's montage theory is arguably considered as the extended explanation and application of the Kuleshov Effect \cite{Barratt_2016} by which viewers derive more meanings from the interaction of two sequential shots than from a single shot in isolation. Eisenstein argues that, when two pictures (or two shots) in the motion picture are sequentially aligned, they, like the stereoscopic effect that is created by human perception, are actually superimposed on top of each other in the audience's brain -- the conflicts of thoughts are formulated by the conflicts of shots. In other words, the conflicted thoughts from people's brain are conditioned and derived by the combination of shots created by filmmakers. By manually changing the shots and cutting pace, filmmakers are able to direct how people's ideas are conflicted and sequentially formed into a new meaning. In this light, although the binocular rivalry is still about the conflicts of consciousnesses, this phenomenon is autonomously controlled by people's brain and free from the control of others. Contrary to montage in motion pictures, although the images under the dichoptic presentation are still selected by others, how the conflicts would happen is completely determined by the viewer's brain itself. However, there is not enough research about whether and how this neurological autonomy would affect the perception result of the combination of two images. Would dichoptically viewing two images yield more meaning than watching the two images in isolation? If it would, will the meaning perceived by the brain during binocular rivalry differ from the interpretation from watching the same pair of images sequentially in motion pictures?
Experiment
Questions & Hypothesis
To study the storytelling possibility of the mystical binocular rivalry, we first need to validate that, like the Kuleshov Effect in motion pictures, dichoptically viewing two images could yield more meanings than watching the two images in isolation. Under such hypothesis, the additional question could be formed: is the new meaning created in the binocular rivalry different from that in the montage? To verify the hypothesis, a between-subject experiment is conducted.
Apparatus
The state-of-art commercial VR headset (Oculus Rift) that has 1080×1200 resolution per eye, a 90 Hz refresh rate and 110° field of view was used in user testing. Each participant was asked to adjust the lenses spacing and height to their most comfortable level, so they could see the arguably clearest stereoscopic VR content in the current major consumer market.
Participants
43 college students from New York University, USA (26 female; 17 male; age 21- 31) participated in the experiment. All participants have used the VR headset before. None of them knew the concept of binocular rilvary.
Materials
The testing application is made in Unity. In the application, users start in a completely dark VR environment where a 1.2*1.5-meters-size image is placed 1 meter in front of them. The locomotion in this VR environment matches the physical movement of users in the real world: they could freely move and rotate both their heads and bodies to see the image from different angles and positions. The Unity project was built to the Oculus Runtime Executable file via Oculus Utilities for Unity (Version 1.25) and the final visual was automatically rendered into split-screen stereo with distortion correction for each eye via the Oculus Compositor. (
https://developer.oculus.com/documentation/pcsdk/latest/concepts/dg-render/)