Robust Probabilistic Imitation Learning (RPIL) is a method that I conceived where in I model a set of expert demonstrations are having two sources, a true expert and an adversary. Using logistic regression we can pose the problem as a mixture of multinomial logistic regression models. From there we can solve the non-convex optimization using an Expectation Maximization like algorithm. Experimentally I show that this algorithm can detect and remove adversarial demonstrations from the training set and thus perform much better that if the demonstrations are considered to be correct.
Imitation learning (IL) attempts to teach an autonomous agent a task given demonstrations. In many convectional IL frameworks these demonstrations are considered to be expert (correct) and homogenous (optimizing the same reward function). This assumption however is often not held in real world applications. For this reason it is of interest to create and IL method that is robust to adversarial demonstrations. This means identifying and removing incorrect demonstrations from a dataset. This project shows that there is a way for simultaneously autonomously identify and remove adversarial demonstrations from the training set, and learn the task. It is show that this can be done through a probabilistic re-weighting of the demonstrations using a Expectation Maximization like algorithm. The results of this method are shown using several common IL baseline problems. A more in depth discussion of theory and results can be found in the paper and presentation. The paper for this project, along with the code can be found on GitHub