Interpreting deep neural networks with sample sets
Venter, Arthur Edgar William
MetadataShow full item record
Despite their impressive performances on a range of widespread tasks, deep neural networks (DNNs) are generally considered `black box' models due to the lack of transparency behind their decision-making processes. Researchers address this issue through the use of interpretability techniques which, in the context of this study, uses some set of rules to map the output of the network back onto its inputs. In recent works, sample set analysis has been proposed as a novel methodology to better study the generalisation capabilities of DNNs through analysing the natural sample clusters formed by the network itself. By being able to directly identify the nodes that process the largest number of class samples, this methodology does o er some potential as a possible means for improving DNN interpretations. In this exploratory study, we investigate the applicability of sample set analysis as a tool for DNN interpretability purposes. We do this by analysing the inner workings of networks trained on the MNIST data set through using sample set analysis in conjunction with the Layer-wise Relevance Propagation (LRP) interpretability technique, while verifying the results using a custom generated synthetic data set. Our analysis led to the introduction of encoding sample sets, an additional sample set category that groups class samples according to their binary node activation patterns in a given layer. Through encoding sample sets, we further introduce the concepts of core and variation nodes, which refer to the nodes that activates for all encoding sample sets within a layer or only a subset of them, respectively. When used in conjunction with LRP, encoding sample sets are capable of generating interpretations which represent groups of samples rather then representing them individually. We coined this approach set interpretations and found that it provides interpretations highly similar to its individual counterparts while simplifying the interpretation process.
- Engineering