To address the 3 key issues as well as to add some quality of life improvements for end-user usability, we made the following modifications to the pipeline:
  1. Formalized procedure drafted for the curation of local dataset for evolutionary fine-tuning.
  2. Collaborated to complete an open-source JAX implementation of the mLSTM to get UniRep representations of sequences, as well as for evotuning to get eUniRep sequence representations. JAX was chosen to achieve 100x speedup in passing new protein sequences through the trained mLSTM to get the 1900-dimensional UniRep / eUniRep representations and technical details can be read in INSERT MANUAL CITATION HERE.
  3. ain improvements in memory required as well as speed.
  4. Implemented a thorough top model evaluation script and settled on the best top model as...
  5. Added computational evaluation for thermostability fitness. Note that fitness can be a wide variety of things, thermostability is just one of them, for which computational verification happens to be possible.

Additional Analysis

Biswas et al verifies the ... by generating these plots ... they show success on 2 datasets avGFP and TEM-1 beta lactamase...
We wanted to more thoroughly verify both the viability and necessity of eUniRep for in-silico low-N protein engineering. This was done by replication and deeper analysis of TEM-1 beta lactamase study as well as application of the pipeline to 2 new target proteins, both using our improved pipeline. Did some cool epistasis analysis as well by comparing training on single and double mutants of MS2 capsid protein.

Results

Comparing our improved pipeline