Low diversity within generated sound effects is currently our models biggest drawback, 92% of sound effect sets, generated using the given conditioning text, were rated as having no diversity at all(Fig
\ref{443553}). We find this a somewhat surprising outcome, as the WGAN-GP object function we are using is purportedly free of mode collapse
\cite{gulrajani2017improved}. The StackGAN model, which our conditioning implementation is heavily inspired by, also does not suffer from similar mode collapse
(Zhang 2017a). We find that, in our model, the generator learns to disregard the input noise vector, leaving the generated sound effects to be almost entirely determined by the conditioning input. A possible solution might be to use heavy dropout to introduce noise directly into the conditioning vector. This seems like a rather heavy handed solution, and better ways of increasing diversity are left as an open area for future research.