loading page

One-shot Image Style Transfer via Pre-trained GAN Inversion
  • Zegang Wang
Zegang Wang

Corresponding Author:[email protected]

Author Profile


Style transfer is one of the hottest topics around deep learning media these days. There are a number of reasons for this, including the demonstrability of the method leaning well into publication and the potential utility of making quick stylistic edits to photos. This combination of utility and ease of demonstration make style transfer one of the most popular first computer vision projects many data scientists, ML engineers, and AI enthusiasts undertake, such as imparting the style of Vincent van Gogh's "Starry Night" to a previously mundane landscape photograph. That being said, it is a rough science. Like many computer vision tasks, the challenge of transferring style on to the rougher and larger areas of an image is far easier than transferring that same style to the finer features of a face. Regions like the eyes and mouth in particular are very difficult to get an AI to approximate for generation correctly. In this tutorial 1 , we will look at JoJoGAN-a novel approach to conducting one-shot style transfer for facial images. This PyTorch-written architecture was constructed with the goal of capturing the stylistic details that have been historically difficult to account for, such as transferring style effects that conserve facial details like eye shape or mouth details. JoJoGAN aims to solve this problem by first approximating a paired training dataset and then finetuning a StyleGAN to perform one-shot face stylization. JoJoGAN is capable of intaking any single image of a face (ideally a high quality head shot of some kind), approximating the paired real data using GAN inversion, and using the data to minutely adjust a pre-trained StyleGAN2 model. The StyleGAN2 model is then made generalizable so that the imparted style can be subsequently applied to new images. Previous one and few shot attempts have approached their level of success, but JoJoGAN has managed to achieve an extremely high level of quality for the images it generates.