When data science goes with the flow: insitro introduces redun

Figure 1. Data science workflow’s missing half. In their work, data scientists produce two main deliverables: code and data. In producing and sharing code, data scientists have tools, such as git, to record the history of their code changes (i.e., code provenance) and ultimately share it with others. In contrast, the workflow for recording and sharing data provenance is less standardized and causes significant friction in reproducing and sharing scientific results.
Figure 2. Completing the data science workflow. With redun, we are exploring how to provide tooling for recording and sharing data provenance that is just as powerful as code provenance tooling. Specifically, we have found that defining a portable data structure, the call graph, can enable the same local recording and syncing capabilities. As call graphs accumulate, one can trace the computational lineage of any file, or compare the differences between executions (“It worked before, what changed?”).

--

--

--

insitro is a data-driven drug discovery and development company using machine learning and data generation at scale to transform the way drugs are discovered.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Boston Airbnb’s Data Analysis

Mercedes-Benz Greener Manufacturing

Visualizing Traffic Conditions Based on Radio Traffic News

Secret Google Experiments You Should Know About

From ‘Hello World’ to Neural Networks a quick glance journey through General Assembly

Klaviyo Data Science EP 24 | Changing the subject (line)

Causal Inference: What, Why, and How

How Data Science and AI-driven Automation transforms the complexities of the BFSI Sector

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
insitro

insitro

insitro is a data-driven drug discovery and development company using machine learning and data generation at scale to transform the way drugs are discovered.

More from Medium

3 Ds of Successful AI Projects: Data, Domain & Discovery — Part 1

Machine Learning-Based Intelligent Marketing Campaigns with 2x Higher Conversion rate

marketing-campaign-optimization-data-snapshot

The shallows of Deep Learning

What Is AutoML and How Can I Use It?