Statistical Modelling Techniques for Default Rate Estimation in Credit Risk Analysis: An Ensemble Learning Approach for Large Financial Datasets

Fernando Luiz Pereira de Oliveira, Federal University of Ouro Preto

Co-authors: Carolina Soares Vieira, Federal University of Ouro Preto; Tiago Martins Pereira, Federal University of Ouro Preto; Gustavo de Souza, Federal University of Ouro Preto

Abstract: This article presents a case study of default rate estimation with ensemble learning based on methods such as decision trees, random forests and boosting, applied to a large dataset from a Brazilian fintech company. Our study emphasizes current challenges in financial data modeling, including class imbalance problems, high dimensionality, and data heterogeneity. This work highlights the practical benefits of using more interpretable Machine Learning frameworks compared to complex black-box alternatives to transform large volumes of complex data into actionable insights for credit managers, thus contributing to safer and more transparent risk assessment practices.