To characterize the ability of the different embedded sequence representations to describe the effects of consecutive mutations, we trained our top model on input data from the MS2 single-mutant fitness landscape, and compared its predictions of double mutant fitness to experimentally determined values for the fitness of double-mutants (Fig 1c). When trained on the full single-mutant fitness landscape data, and tested on the double mutant dataset, the best predictive accuracy was achieved by use of the One-Hot encoding (MSE = 0.743), followed closely by eUniRep and Global UniRep (MSE = 0.987 and 1.07, respectively). As an additional comparative metric to the One-Hot representations, we calculated the additive predictions of fitness for double mutants, using the single-mutant fitness data and Equation 2, which had a mean square error of 0.841, performing just slightly worse than the One-Hot representations.   <-- ANDREW: i'm confused about this entire italicized section, what is it actually talking about ? what are the MSE values from? trained on single tested on double or tested on single? why do the one hots perform the best if it is on double? i thought they were supposed to suck?