Regularized Multi-Omics Regression Modeling for Protein Data

Jonas Heiner, TU Dortmund University

Co-authors: Jan G. Hengstler, Leibniz Research Centre for Working Environment and Human Factors at the Technical University of Dortmund (IfADo); Andreas Groll, TU Dortmund University

Abstract: We analyze the relationship between protein encoding RNA and their resulting proteins by modeling protein levels in a regression context, incorporating other omics variables as potential covariates. Given the simultaneous objective to investigate inter-omics effects by detecting covariates with high explanatory power and the typical small sample sizes in toxicological studies, regularized linear regression is the ideal choice for this research. To compare various covariate combinations, we use a flexible Lasso model that allows for the incorporation of different omics types as well as covariate weights and data subsets. This includes consideration of co-regulatory clusters, which are already used in practice. In a real-world data application, we were able to considerably improve the average accuracy of the prediction of a single protein’s abundance by incorporating all available omics data.