Make for Data Scientists

19 Nov 2023

💡 This is a Data Dispatch post.

Make for Data Scientists: https://paulbutler.org/2012/make-for-data-scientists/.

I recently had occasion to use make again (I haven’t really used it since I used C last), this time to orchestrate the components of an ML POC.

The POC has four pieces:

Preprocessing (ingestion + cleanup of source data).
Feature engineering
Modelling (training/predicting).
Postprocessing.

The dataset we worked with was small as well, around 300k records.

Despite pretty obvious linear dependencies between the stages, using a Makefile helped keep things nicely organised and efficient. Definitely something I’ll be using regularly going forward👌.

(The link is a nice light intro to using make. For details, see the GNU Make documentation: https://www.gnu.org/software/make/manual/make.html.)

Dennis I. Barrett programming • math • data

Make for Data Scientists

Related posts

The State of Metastores 24 Feb 2024

The Jungle of Metrics Layers and its Invisible Elephant 19 Feb 2024

Extremes, Outliers, and Goats 04 Feb 2024