POINT 6

Compare the 3 proteins and how input span range is really importance and how if you don't have a wide enough range of input data you get shit performance (i.e. in TEM-1 fitting was ass until we pulled randomly from the entire dataset instead of just part of it).
Also talk about length of 2MS2 sequences vs PETase and TEM1, as well as the actual inherent function we are trying to predict. catalytic activity is unique to TEM1 but stability is not unique to 2MS2.
While one of our main goals in this project was to use the mutant data from Cui et al to PETase variants that exhibit improved Tm, our all predictive models trained with this data set showed poor performance (FIG..). We suspect that this poor performance can be attributed to two primary factors: first, the data set was considerably small, with only 85 mutant samples; second, many of the data represent mutations occurring towards the middle of the amino acid sequence, and likely do not adequately span the configuration space of possible mutations that we would be predicting for.
Could all of our evotuning be a little off? eUniRep performs worse than unirep in PETase and 2MS2. we see that the evotuned weights for TEM-1 actually perform better than the ones evotuned for PETase... hard to say what that tells us.