As shown in Figure 1, four images were used in the experiment. Image (a) is from the emotion perception task developed by Penton-Voak et al \cite{Penton_Voak_2013}. The researchers composited prototypical happy images of 20 individual male faces showing a happy facial expression and the same 20 faces showing an angry expression and created the image of a statistically approved neutral face. The original images came from the Karolinska Directed Emotional Faces \cite{Lundqvist_1998}.  Image (b) is a royalty free photo of a puppy that was used with the intention to arouse the happiness emotion according to Barratt's retest of the Kuleshov Effect \cite{Barratt_2016}. Images (c) is produced by multiple blending \cite{2011a} image (a) and (b), and image (d) is composed of only random noise. All images are converted into grayscale by reducing the level of saturation to make sure that image (a) and (b) share a similar eye significance factor\cite{Leopold_1996}.

Method

As shown in figure 2, three types of visual stimulus -- (1) montage (2) dichoptic presentation and (3) a single image--were presented to the participants in each individual trial in the VR environment. A figure of a man with neutral facial expression could be seen in each visual stimulus and the noise image (4) from Figure 1 was shown to the participants in the 3-second-interval between each stimulus. 
The montage was the looping sequence composed of the image (a) and image (b) from Figure 1 that lasted 3 seconds individually. The image (a) was always shown first and the sequence would cut to the image (b) after 3 seconds. The duration of the image was set according to the estimated Average Shot Length (ASL) in mainstream Hollywood films of between 3 and 4 s \cite{Cutting_2011}.