Our training data comes from several different sound effect libraries, purchased from
asoundeffect.com, which we unfortunately cannot make publicly available. We opted to use commercial sound effects over publicly available ones due to their easy access, consistent audio quality and generally high quality metadata. We trained two main models for this paper. We use the smaller Articulated Magic Elements dataset as the training dataset for the main evaluation of our model. This dataset provides several distinct modalities, whiles also constraining the modalities to fit within the category of magic. This allows our generator to generate higher quality output, as less modalities need to be covered, and the distinct sub-categories of ice, wind, earth, fire, air, and black still provide enough variation for conditional generation of sound effects to be interesting. We also train a model on all our available sound effect libraries. This model was mostly used during development and we did not evaluate this model with our panel of human judges. However, this dataset did provide interesting case studies of both failure and success cases, so we list the sound effect libraries we used here. The seven sound effect libraries used for this model were: Animal HyperRealism, Articulated Magic Elements, Eclectic Whooshes, Gamemaster Audio - Pro Sound Collection, Lethal Energies, Polarity and Swordfighter. All libraries, including the evaluated Articulated Magic Elements, can be found over at
asoundeffect.com, with sample audio available. Owing to the fact that we can only produce just over one second of audio, we first preprocess the training audio by splitting audio clips on silent intervals, then discarding any audio clips that are longer than our network can produce. Our final datasets are relatively small, and in order to augment them we apply a small pitch shift at training time, similar to what is done in games to disguise repeated sound effects and make each sound effect sound unique.