Model selection confidence sets for the number of mixture components

Alessandro Casa, Free University of Bozen-Bolzano

Co-authors: Davide Ferrari, Free University of Bozen-Bolzano

Abstract: Selecting the number of components in mixture models is a fundamental task in clustering and density estimation. Traditional methods rely on information criteria to choose a single best model and, consequently, the mixture order, thus overlooking selection uncertainty and plausible alternative models, especially when the noise is pronounced. We propose the Model Selection Confidence Set, a set-valued estimate that includes all models statistically indistinguishable from the best one, evaluated using a penalized likelihood ratio test. This approach includes in the set the true number of components with high probability, providing theoretical guarantees on its asymptotic coverage. Our method offers a more robust alternative to traditional model selection techniques.