A Cohesive Distillation Architecture for Neural Language Models