Scoping Review of R Packages for Statistical Analysis and Machine Learning in the OMOP Common Data Model.

Arya Soman, University of Limerick

Co-authors: Aedin Culhane, University of Limerick; Amir Jalali, University of Limerick;  Shirin Moghaddam, University of Limerick

Abstract: The OMOP Common Data Model (CDM) is central to observational health research, providing a consistent framework for data standardization and cross-institutional collaboration. Within this landscape, the R programming ecosystem, particularly through the OHDSI initiative, offers a suite of statistical modelling tools. However, the proliferation of these packages without a unified evaluation framework has introduced fragmentation, impeding reproducibility and interpretability. This study addresses that gap by developing a comprehensive functional taxonomy and benchmarking framework to evaluate R packages relevant to OMOP CDM analytics. We introduce a statistically motivated scoring methodology based on an ordinal evaluation across eight core modelling domains. Packages were empirically assessed using synthetic OMOP data (Eunomia) to measure execution speed, scalability, and output fidelity. Visual tool such as heatmaps and Sankey diagrams elucidate the functional landscape, while a decision-tree framework supports analytic decision-making. Our findings highlight key methodological strengths and identify deficits in deep learning, and GUI integration. We propose directions for advancing statistical modeling within the OPOM ecosystem, and offer a reproducible, decision oriented framework for applied statisticians in real-world health data science.