• To quantify the performance of a predictive model within this specific context of application, it is important to acknowledge the important factors in its ultimate use case: the Metropolis-Hastings acceptance criteria. In the MCMC simulations that emulate directed evolution, the probability a proposed mutant sequence is accepted as the new sequence at each iteration is dependent on if the proposed sequence has a greater fitness score than the current sequence. Thus, an appropriate top model would not necessarily prioritize predicting fitness scores that optimize the average closeness to the experimental fitness scores, but rather prioritize accurately predicting the relative magnitude of each fitness score with respect to the values of other mutant sequences being considered. In Biswas et al, the only top model evaluation that is done is on characterized