When data science goes with the flow: insitro introduces redun

Figure 1. Data science workflow’s missing half. In their work, data scientists produce two main deliverables: code and data. In producing and sharing code, data scientists have tools, such as git, to record the history of their code changes (i.e., code provenance) and ultimately share it with others. In contrast, the workflow for recording and sharing data provenance is less standardized and causes significant friction in reproducing and sharing scientific results.
Figure 2. Completing the data science workflow. With redun, we are exploring how to provide tooling for recording and sharing data provenance that is just as powerful as code provenance tooling. Specifically, we have found that defining a portable data structure, the call graph, can enable the same local recording and syncing capabilities. As call graphs accumulate, one can trace the computational lineage of any file, or compare the differences between executions (“It worked before, what changed?”).

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
insitro

insitro

insitro is a data-driven drug discovery and development company using machine learning and data generation at scale to transform the way drugs are discovered.