Nima Hejazi & Jeremy Coyle -- Machine Learning Pipelines for R with sl3

April 18, 2018 at 5-6:30pm in BIDS, 190 Doe Library

About Nima and Jeremy

Nima is a PhD student in the Group in Biostatistics, where he is jointly supervised by Mark van der Laan and Alan Hubbard. Nima is also affiliated with the UC Berkeley NIH Biomedical Big Data training program and the Center for Computational Biology. Currently, his research centers around nonparametric statistical and causal inference, machine learning, and statistical computing – focusing on the development of robust techniques for inference and estimation in an eclectic collection of problem settings, with applications often arising in precision medicine, vaccine efficacy trials, computational biology, and public policy.

Jeremy is a recent PhD graduate in Biostatistics who continues working with the department to translate statistical theory to software. During his PhD studies, Jeremy worked with Alan Hubbard and Mark van der Laan on a series of projects broadly related to computational statistics, including more efficient cross-validation routines for ensemble machine learning and a software framework for cross-validation (origami). His current research interests include causal inference, model selection, re-sampling techniques, statistical software development, and statistical methods for assessing time series data from sensor systems.

Machine Learning Pipelines for R with sl3

We present sl3, a recently developed software package for the R language and environment for statistical computing, designed to provide utilities for engaging in a host of common machine learning tasks. Topics to be addressed include efficient data organization and accession, the construction of pipelines for data munging and analysis (based on the idea popularized by Python’s scikit-learn), and methods for performing ensemble machine learning (e.g., optimal stacked regressions). sl3 is a core part of the tlverse, a new ecosystem of software packages currently being developed by a team in the Group in Biostatistics here at Berkeley.

Selected materials for this presentation are available on GitHub here.

Software Setup

R and RStudio Installation

Jupyter R Kernel Installation

sl3 Installation


devtools installation (if needed)