Metaflow does come bundled with a scheduler that can place jobs on a variety of compute platforms (current release supports local on-instance and AWS batch). In terms of dependencies, we went with conda because of its traction in the data science community as well as excellent support for system packages. Our execution model also supports arbitrary docker containers (on AWS batch) where you can theoretically bake in your own dependencies. In terms of language support, we have bindings for R internally, that we plan to open source as well.
I wouldn’t qualify metaflow as anti-UI. For model monitoring, we haven’t found a good enough UI that can handle the diversity of models and use cases we see internally, and believe that notebooks are an excellent visualisation medium that gives the power to the end user (data scientists) to craft dashboards as they see fit. For tracking the execution of production runs, we have historically relied on the UI of the scheduler itself (meson). We are exploring what a metaflow-specific UI might look like.
As for comparisons with Airflow, it is an excellent production grade scheduler. Metaflow intends to solve a different problem of providing an excellent development and deployment experience for ML pipelines.
Thanks for open sourcing this! This seems like precisely the kind of tool I‘ve been looking for. One thing that would be really great is support for HPC schedulers like SLURM. It seems relatively straightforward to add, so I might give it a shot myself.
I wouldn’t qualify metaflow as anti-UI. For model monitoring, we haven’t found a good enough UI that can handle the diversity of models and use cases we see internally, and believe that notebooks are an excellent visualisation medium that gives the power to the end user (data scientists) to craft dashboards as they see fit. For tracking the execution of production runs, we have historically relied on the UI of the scheduler itself (meson). We are exploring what a metaflow-specific UI might look like.
As for comparisons with Airflow, it is an excellent production grade scheduler. Metaflow intends to solve a different problem of providing an excellent development and deployment experience for ML pipelines.