Reproducible Workflows
EDS 214: Analytical Workflows and Scientific Reproducibility
Day 1 Morning | August 25th, 2025
Welcome to EDS 214!
This is a class about workflows and reproducibility
The workflow is the fundamental unit of data science
Reproducibility ensures workflows don’t become dead ends
The concepts and skills in this class will keep coming up in future classes and for the rest of your career
Hi, I’m Max 👋
What is a workflow?
THINK Take 1 minute to write down your own definition. Consider:
PAIR Turn to your neighbor and discuss your definitions
SHARE I will randomly call on a pair to share one shared element and one differing element
What is a workflow?
What is reproducibility?
We define reproducibility to mean computational reproducibility — obtaining consistent computational results using the same input data, computational steps, methods, code, and conditions of analysis
Reproducibility and Replicability in Science (2019)
DEBATE With your neighbor, discuss: does reproducibility matter if you’re not doing academic research?
Global and local goals
Reproducibility is
GLOBAL GOAL
… a means to verify results and ensure scientific integrity
LOCAL GOAL
… a mechanism for knowledge transfer and collaboration
Feinberg et al. (2020)
How does class work?
Reproduce a workflow (starting today!)
Lectures and interactive activities for learning goals
Specification-based grading
Feedback loops are critical (no one-and-done)
You’ll see more spec-based grading in other MEDS classes
“Reproducible workflows saved my life <3”
Bekah Lane
Research Associate, Center for Coastal Studies
Question: Where are humpbacks in SF Bay most susceptible to ship strikes?
Problem: Manual, point-and-click solution in ArcGIS is error-prone, slow, and a nightmare to edit with collaborator feedback.
Solution: Use automation, documentation, and modular design to create a well-organized, reproducible workflow in R
“Reproducible workflows saved my life <3”
Bekah Lane
Research Associate, Center for Coastal Studies
“It takes more activation energy to ‘do it right’, but you save yourself loads of time relative to doing it by brute force.”
“In my current job I’m often asked to do similar analyses for multiple projects. I can save a lot of time by recycling code between projects.”
“Collaboration and communication become a lot easier when you have your workflow laid out and organized instead of all jumbled in your brain.”
Works cited