Complex computer simulations are commonly required for accurate data modelling in many scientific disciplines, making statistical inference challenging due to the intractability of the likelihood evaluation for the observed data. Furthermore, sometimes one is interested on inference drawn over a subset of the generative model parameters while taking into account model uncertainty or misspecification on the remaining nuisance parameters. In this work, we show how non-linear summary statistics can be constructed by minimising inference-motivated losses via stochastic gradient descent such they provided the smallest uncertainty for the parameters of interest. As a use case, the problem of confidence interval estimation for the mixture coefficient in a multi-dimensional two-component mixture model (i.e. signal vs background) is considered, where the proposed technique clearly outperforms summary statistics based on probabilistic classification, which are a commonly used alternative but do not account for the presence of nuisance parameters.
The results of this work is published at https://arxiv.org/abs/1806.04743
This work is part of a more general effort to develop new statistical and machine learning techniques to use in High Energy Physics analyses within within the AMVA4NewPhysics project, which is supported by the European Union’s Horizon 2020 research and innovation programme under Grant Agreement number 675440.
Step-by-step guide
To reproduce the results of the article https://arxiv.org/abs/1806.04743
- git clone https://github.com/pablodecm/paper-inferno.git
- four Jupyter notebooks are available under notebooks directory