loading page

Evaluating eUniRep and other protein feature representations for in silico directed evolution
  • Andrew Favor
Andrew Favor
University of California, Berkeley

Corresponding Author:[email protected]

Author Profile

Abstract

This study analyses and adds to the Low-N protein engineering with data-efficient deep learning work done by Biswas et al \cite{Biswas2020}. We provide a complete, open-source, end-to-end re-implementation of the in silico protein engineering pipeline with improved computational efficiency,  more detailed documentation, cleaner API and additional features to lower the barrier to entry for use of this pipeline as an engineering tool. We additionally perform a more thorough evaluation of the success and necessity of each step in the pipeline for in silico directed evolution, by re-implementing select portions of the study of TEM-1 β-lactamase, as well as applying the full in silico pipeline to 2 novel protein engineering tasks - increasing the melting temperature of plastic degrading enzyme IsPETase and improving the thermostability of viral capsid bacteriophage coat protein MS2. Our findings corroborate some of the main benefits of the eUniRep protein representation highlighted in Biswas et al, but we also highlight some key limitations not previously discussed. Finally we provide a simple mathematical and case study proof that linear kernels are equivalent to additive fitness landscapes and outperform more complex models on small or single mutation prediction tasks. This is assumed in many previous works but never explicitly shown. We believe it helps to further elucidate the main strength of the eUniRep representation in its ability to overcome epistatic effects in proposing extensively mutated candidate sequences with optimized functionality.