This paper studies neural net imitation learning pipelines that accept demonstrations and natural language descriptions of the task as input. We introduce a set of architectures and test them in a self-driving simulator, finding that our suggested architecture better differentiates between seen and unseen behaviors.