https://dagster.io/ logo
#deployment-ecs
Title
# deployment-ecs
r

Rob Martin

05/29/2020, 6:58 PM
Hi Cat - we’ve just started looking into DAG managers (including Dagster). In our case, we’re looking to use ECS indirectly, by automating jobs using AWS Batch. We haven’t found any first-class support yet (except in Netflix’s Metaflow), so we’re still examining our options…
👍 2
c

cat

05/29/2020, 8:39 PM
gotcha. it’s interesting that metaflow’s compute layer integrates with aws batch instead of ecs directly, probably worth considering for dagster too. i was wondering how you made the decision that aws batch was a better fit than using aws ecs directly — is it mostly because the lifecycle is fully managed?
r

Rob Martin

05/29/2020, 9:51 PM
Yeah, generally ease of use and lifecycle management, though I suppose some of that could be provided by Dagster?
We want to be able to together (docker-based) tasks of varying sizes, up to the larger instance types (r4.8xl, etc)
… so our assumption is any pull-based system (i.e. with workers awaiting tasks) wouldn’t work for us. We only want to keep instances alive when necessary.
c

cat

05/29/2020, 10:09 PM
So with respect to managing lifecycle, dagster provides a run master that kicks off and monitors jobs and also handles user configured retries
A common strategy on the K8s side (which I think should be similar here) is to create an ephemeral k8s pod per step (ie per node in the dag) that shuts down once the step completes
in this case, only the dagit instance (and potentially celery / flower / broker) needs to be kept up all the time, but the actually compute pods dont
definitely see where youre coming from -- the architecture of our system is that run launchers and step executers are spun up per start of pipeline run, so that resources arent wasted
r

Rob Martin

05/29/2020, 10:43 PM
Ah right - not very familiar with k8s, but I think I follow. So there’s no reason we wouldn’t be able to build a DAG of tasks where the resources allocated (CPU/memory) to each task is determined on the fly? Thanks for your help on this, btw!
n

nate

05/30/2020, 12:18 AM
hey Rob - by “on the fly” do you mean you’d like resource limits (and node sizes, etc.) to be defined per task at DAG definition time, or do you truly want to set these dynamically at execution time? If the former, it should be straightforward to define resource limits on solids (using “tags”), and then flow these through to AWS Batch using our “run launcher” abstraction
we haven’t built an out-of-the-box AWS Batch or ECS run launcher yet, but you can see an example of how we perform execution on Dask (with per-solid resource limits) here https://github.com/dagster-io/dagster/blob/master/python_modules/libraries/dagster-dask/dagster_dask/engine.py
r

Rob Martin

05/30/2020, 12:49 AM
Thanks, Nate. We don’t truly need execution-time resource allocation at this time, though it’s possible in the future. I’ll take at look at the Dask integration. Thanks again!
4 Views