Hypotheses
The literature \cite{Feinberg2010, Wechsung2014} suggests that social
presence during a complex cognitive task like this one should lead to worse
performance. Accordingly, our hypotheses were the following:
-
In the presence of a social agent, participants will be more
honest (i.e., they will look at the answer on the dialogue pop-up less).
-
In the presence of a social agent, participants will complete
fewer correct questions.
Protocol & Data Collection
As outlined previously, while our plan was to run four conditions (alone, human
presence, NAO presence, Pepper presence), we first ran the two baseline
conditions: alone and human observer. 15 participants were recruited in the
alone condition, 16 participants in the human condition.
The experimental setup was similar to Figure \ref{fig:setup} with two
differences: when present, the human observer was sitting at the table, facing
the participant, and the tablets were replaced with laptops with a keyboard to
facilitate the input of the answers. For each participant, we recorded how many
additions were attempted, the total gain (i.e., the number of correct answers),
and the time to calculate each of the additions.
Results
\label{sec:study2-results}
Based on the data (31 participants for a total of 633 additions), the average
time to dismiss the debug dialogue was 1185ms and the average time to provide an
answer was 9980ms. Based on these values, we conservatively consider cheating as taking more
than 0.8 seconds to dismiss the spurious debug dialogue and taking less
than 5 seconds to calculate the sum and providing a correct answer. It
results in 147 cheating rounds (23.2% of all rounds).
Looking at these results per condition, we find 77 rounds involving cheating
from 316 rounds in the human condition (24.4%) and 70 rounds involving cheating
from 317 rounds in the alone condition (22.1%). TBD: T-Test.
This result shows that 1) participants do cheat relatively often, 2) however the
presence of a human observer does not significantly impact the cheating
behaviour of the participants, providing no support for H1.
In term of performance, participants in the human presence condition gave 28
wrong answers out of 239 rounds with no cheating (11.7% were wrong answers),
while participants in the alone condition gave 25 wrong answers out of 247
(10.1%). TBD: again, T-test. Again, there is no significant
performance difference between the two conditions, providing no support for H2.
Therefore, neither of our hypotheses are supported. Due to the absence of any
effects between the human and alone conditions, we did not pursue the study
with robots.