Discussion

\label{sec:disc}
Several factors might explain our failing at observing any effect of social facilitation or social inhibition: the small effect sizes of social facilitation, certain weaknesses in the psychology literature and a broader publishing bias toward positive result that might mask how brittle some effects are.
The difficulty of observing social facilitation
What does this failed attempt at reproducing a ‘classic’ result of social psychology tell us? Beyond possible experimental confounds, our failure at reproducing these results is likely due to the small effect size of social facilitation. In their meta-analysis of studies on social facilitation, Bond and Titus \cite{Bond1983} showed that the overall mean effect sizes are low, ranging from 0.03 to 0.36. According to Cohen \cite{Cohen1977}, effect size of 0.2 should be regarded as small, an effect size of 0.5 as medium, and 0.8 as large.
complete/develop
Social facilitation or inhibition is as well affected by a combination of several other psychological effects: the observer effect (also known as the Hawthorne effect); demand characteristics; cultural conventions; personality orientation; […]. These effects are potential confounds, and adequately accounting for each of them in the experimental design proved problematic.
For example, Liad Uziel \cite{Uziel2007} compared the effect sizes of studies in the literature that examined personality effects for subjects with negative orientation (trait anxiety and low self-esteem) and positive orientation (extraversion and high self-esteem). The results suggest that the effect of orientation is higher than the task complexity in determining the performance, that is, the subjects with positive orientation had an improve in performance for both simple and complex tasks in the mere presence of a person, and similarly, those under negative orientation had an impairment in the performance in both types of tasks. However, these results were only based on a few studies that examined the personality effects, hence, the author states that the results cannot be generalised.
how was it done in other studies? was it accounted for at all? ¡- I couldn’t find anything else but the above point
We believe that these comments on, on one hand the small effect sizes, and on the other hand the complexity of multi-faceted psychological effects, are applicable beyond the specific case of social facilitation.
complete/develop
Weak methods in older psychology literature
Beyond the caution that must be observed when studying one specific psychological effect, a broader range of methodological issues with older research in psychology might explain why some results in psychology are incorrectly believed to be reliable.
For instance, Bond and Titus \cite{Bond1983} meta-analysis of research on social facilitation claims to have exhaustively examined every publications prior to the publication of the meta-analysis itself (in 1983). As a matter of fact, the oldest study that they refer dates from 1898, and 35 out of the 241 were published prior to 1965. As such, social facilitation is a good example of an old, classical psychological effect. It however also hints to the fact that its characterisation might have relied on weak research methodologies by today standards. Bond and Titus raise in that regard interesting points: only 100 out of the 241 studies state that the experimenter was in a different room in the \alonecondition(and in 96 studies, we know the experimenter was in the room). This would be seen today as a serious confound. Similarly, Bond and Titus report that 72.3% of the total participants were undergrad students, pointing to a possible serious demographic bias.
Biases in scientific publishing: the ‘file drawer’ problem
Coined in 1979 by Robert Rosenthal \cite{Rosenthal1979}, the file drawer problem refers to the bias introduced into the scientific literature by mainly publishing positive results, and rarely negative or non-confirmatory results. As a consequence, an effect could be reported and believed reliable, simply for the lack of literature showing the contrary. Rosenthal proposes to account for this problem by reporting in meta-analysis the ‘fail-safe N’ measure: N is the number of null effects that would be required to make the original result non-significant. Rosenthal proposes to consider an effect resistant to the ‘file drawer problem’ of unreported null effects iff the fail-safe N is above \(5k+10\), with \(k\) the number of reported effects.
Bond and Titus report the fail-safe N for some of the effects of social facilitation. For instance, their meta-analysis show that the performance quantity of participants for complex tasks reliably decreases in presence of an observer (even thought the effect size is small). 54 effects are reported, and they note that the fail-safe N value is 160: 160 is clearly smaller than \(5\times 54+10=280\) and as such, this result could well be subject to the problem of unreported null effect. The fact that social presence inhibit the performance in complex tasks is not a robust result in the face of the bias towards publishing only positive results.
The fail-safe N for the ‘quantity’ for simple tasks is 6,183 and the ‘quality’ for complex tasks is 5,697. That is why, I wrote in the literature review that quantity increases with simple tasks and quality decreases with complex tasks. So I don’t know how we should add this, otherwise it might be seen as we didn’t put into account the fail-safe N in the first place
A weighted calculation of the fail-safe number has been proposed \cite{rosenberg2005file} that address some of the concerns with Rosenthal proposal, and while not systematically reported in the literature, this metric is a valuable tool for HRI researchers when assessing how robust a result in psychology is.