Boosting distributional regression for polygenic risk modelling on huge cohorts
Qiong Wu, University of Marburg
Co-authors: Hannah Klinkhammer, University of Marburg; Kiran Kunwar, University of Marburg; Christian Staerk, TU Dortmund University; Carlo Maj, University of Marburg; Andreas Mayr, University of Marburg
Abstract: Polygenic risk scores can be used for predicting medical outcomes characterized by a complex genetic architecture. Current methods primarily focus on modelling the mean of a phenotype without explicitly considering the effect on phenotypic variance. We develop a distributional regression approach to derive sparse polygenic models for both the mean and variance of a phenotype simultaneously. Specifically, we introduce snpboostlss, an algorithm that applies cyclical gradient boosting for Gaussian location-scale models on genotype data. To improve computational efficiency on high-dimensional and large-scale genotype data (large n and large p), in each boosting step only a batch of variants that are most correlated with current residuals will be considered as candidate base-learners. We illustrate our approach on analyzing BMI in the UK Biobank and find that the constructed polygenic model for the variance indicates gene-environment interaction effects. Therefore, the predicted polygenic score for phenotypic variability derived by snpboostlss could have potential clinical use for stratification on who may benefit more from environmental (e.g. lifestyle) changes.