Reproducible Workflows

EDS 214: Analytical Workflows and Scientific Reproducibility


Day 1 Morning | August 25th, 2025

Welcome to EDS 214!


This is a class about workflows and reproducibility


The workflow is the fundamental unit of data science


Reproducibility ensures workflows don’t become dead ends


The concepts and skills in this class will keep coming up in future classes and for the rest of your career

Hi, I’m Max 👋

What is a workflow?


THINK Take 1 minute to write down your own definition. Consider:

  • What are the components? How do they fit together?
  • Who is the audience?
  • What would the alternative be?

PAIR Turn to your neighbor and discuss your definitions

SHARE I will randomly call on a pair to share one shared element and one differing element

What is a workflow?


Workflows combine: data, code, modeling, and communication.

Components of a workflow

Wickham, Çetinkaya-Rundel, and Grolemund (2023)

The final workflow masks a trajectory of exploration and dead ends

The evolution of a workflow

Stoudt, Vásquez, and Martinez (2021)

What is reproducibility?


We define reproducibility to mean computational reproducibility — obtaining consistent computational results using the same input data, computational steps, methods, code, and conditions of analysis

Reproducibility and Replicability in Science (2019)

DEBATE With your neighbor, discuss: does reproducibility matter if you’re not doing academic research?

  • What if you work for a public utility, a non-governmental organization, or a private firm?
  • Articulate one argument for and against the importance of reproducibility.
  • I will randomly call on a pair to share

Global and local goals


Reproducibility is

GLOBAL GOAL

… a means to verify results and ensure scientific integrity

LOCAL GOAL

… a mechanism for knowledge transfer and collaboration

Feinberg et al. (2020)

How does class work?


Reproduce a workflow (starting today!)

Lectures and interactive activities for learning goals

  • Automation
  • Organization
  • Documentation
  • Scale(ation)
  • Collaboration

Specification-based grading


  • No points out of 100
  • A rubric of specifications (specs) for your final project
  • Meet (or exceed) specs and participate in class - get an A


Feedback loops are critical (no one-and-done)


You’ll see more spec-based grading in other MEDS classes

“Reproducible workflows saved my life <3”


Bekah Lane headshot

Bekah Lane

Research Associate, Center for Coastal Studies

Question: Where are humpbacks in SF Bay most susceptible to ship strikes?

Problem: Manual, point-and-click solution in ArcGIS is error-prone, slow, and a nightmare to edit with collaborator feedback.

Solution: Use automation, documentation, and modular design to create a well-organized, reproducible workflow in R

“Reproducible workflows saved my life <3”


Bekah Lane headshot

Bekah Lane

Research Associate, Center for Coastal Studies

“It takes more activation energy to ‘do it right’, but you save yourself loads of time relative to doing it by brute force.”

“In my current job I’m often asked to do similar analyses for multiple projects. I can save a lot of time by recycling code between projects.”

“Collaboration and communication become a lot easier when you have your workflow laid out and organized instead of all jumbled in your brain.”

Works cited

Feinberg, Melanie, Will Sutherland, Sarah Beth Nelson, Mohammad Hossein Jarrahi, and Arcot Rajasekar. 2020. “The New Reality of Reproducibility: The Role of Data Work in Scientific Research.” Proceedings of the ACM on Human-Computer Interaction 4 (CSCW1): 1–22. https://doi.org/10.1145/3392840.
Reproducibility and Replicability in Science. 2019. National Academies Press. https://doi.org/10.17226/25303.
Stoudt, Sara, Váleri N. Vásquez, and Ciera C. Martinez. 2021. “Principles for Data Analysis Workflows.” Edited by Patricia M. Palagi. PLOS Computational Biology 17 (3): e1008770. https://doi.org/10.1371/journal.pcbi.1008770.
Wickham, Hadley, Mine Çetinkaya-Rundel, and Garrett Grolemund. 2023. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. 2nd ed. Sebastopol, CA: O’Reilly Media. https://r4ds.hadley.nz/.