Towards Black-Box Explainability with Gaussian Discriminant Knowledge Distillation

In this paper, we propose a method for post-hoc ex- plainability of black-box models. The key component of the semantic and quantitative local explanation is a knowledge distillation (KD) process which is used to mimic the teacher’s behavior by means of an explainable generative model. Therefore, we introduce a Concept Probability Den- sity Encoder (CPDE) in conjunction with a Gaussian Discriminant Decoder (GDD) to describe the contribution of high-level concepts (e.g. object parts, color, shape). We argue that our objective function encourages both, an ex- planation given by a set of likelihood ratios and a measure to describe how far the explainer deviates from the training data distribution of the concepts. The method can leverage any pre-trained concept classifier that admits concept scores (e.g. logits) or probabilities. We demonstrate the ef- fectiveness of the proposed method in the context of object detection utilizing the DensePose dataset.