Method

Subjects were given a show up fee of about 30 Euro and asked to shop for two days for their household, following their usual shopping habit. Subjects are exposed to a paper catalog of 290 products, covering 39 grocery categories and displaying for each product a full color picture, weight or volume, and price. Prices are the same found in shops in Grenoble, as queried before the sessions ran. Subjects can then shop on a custom online e-shopping environment, thanks to which they can also access, for each product, all the mandatory information that customers usually find in the back-of-pack tables: ingredients and a nutritional table. Subjects shop by means of a barcode reader, scanning the items they need from the paper catalog. Pictures of the lab and of the catalog are given in Figure 2 below.
Subjects were asked to shop twice; once on a neutral catalog, displaying products and prices; and a second, previously unannounced time, in which additional information is added to the catalog and to the on-line shopping environment. This additional information varies for each of the 7 between-subjects treatments: one for each of 5 nutritional labels, in which all products are labeled; a further treatment with only partial coverage of the application of the label, and a final, benchmark neutral treatment in which no label is added and subjects simply shop twice over the same catalog
The experiment is incentive compatible: subjects know that they are submitting binding choices, meaning that they will have to buy an ex-ante unknown part of their cart from the experimenters. One of the two carts is randomly selected at the end of the experiment to be binding. In a separate room, we stock about one quarter of all the catalog products. The intersection of the items selected by the subjects and what we have in store is then sold, at the catalog prices, to the subjects at the end of the session. A visual representation of our method is given in Figure 3, below.

Measure

Our experimental design (already put to use in tis general form in \citet{Muller_2017,Muller_2012}) allows us to measure behavior twice. The first, unlabeled cart allows us to set a benchmark for the shopping behavior of each subject. The second cart allows us to assess, within subjects, changes with respect to the baseline cart for each subject. The comparison of the within-subject changes across treatments allows us to cleanly assess, with a diff-in-diff approach, the effect of labels once netted out of individual preference heterogeneity, controlled for by the first cart.
The main measure of interest is the treatment aggregation of the individual change in the nutritional score between cart 2 (labeled) and cart 1 (unlabeled). We adopt as a nutritional measure the Nutrient Profiling Model developed by the UK Food and Standard Agency [5]. This score is computed for each product by assigning negative points proportional to the levels of salt, saturated fatty acids, calories, and sugar, and positive points for fiber, fruit & vegetable content and proteins. The score spans from -15 to 25 in its original form, with lowest numbers indicating better overall nutritional quality. For the sake of clarity and as suggested in [5], we linearly manipulate the score to yield values from 0 to 100, with higher scores indicating better nutritional value.
DIFF IN DIFF WITH EQUATION
FSA explanation
FSA explanation
Subjects
The study involved 832 subjects, over 51 sessions lasting about a hour and a half. 23 subjects havng submitted empty carts in at least one of the two shopping exercises, only 80 were kept for analysis. Subjects were recruited among the general population of the Grenoble metro area, a mid-size agglomeration of about four hundred thousand people in the Alps in south-eastern France. Subjects were recruited among the ones in charge of grocery shopping for their household and being regular supermarket shoppers (self-declared). Moreover, the sample was stratified by household disposable income: a third of subjects with less than two thousand Euro per month, a third between two and three thousand, and a further third with more than three thousand. Summary statistics of our sample are provided in Table 1, below. Overall, one subject in five is a man, the randomization worked insofar as there are no significant differences across treatments, and the age and income structure loosely reflect those of the Grenoble Metro Area.
Results: FSA score
Results, detailed in French in [4], show that the NutriScore label outperforms all other labels. An outlook of the results – excluding for the sake of comparability the treatment applying the NutriScore label to a partial subset of products – is given in Table 2. We compute both aggregate averages, merging all carts of all subjects by treatment and phase, and mean of individual changes.