Hi all, is there any benefit to use Dask over ECS? I am evaluating Dagster, and I was planning to use Coiled. But seeing that ECS is available, I think it is a more economical option.
dagster bot responded by community 1
07/21/2022, 11:39 PM
I'm not intimately familiar with Dask, but the only benefit I could see is if you have heavy jobs that need more than the 4vcpu/30GB that ECS Fargate can provide - for instance if you're doing big parallel fan-outs within a job and need more horsepower in the job or in each op.
My experience with the ECS option so far is that it ends up being pretty dang cheap, but we offload our in-op heavy processing to PySpark via the DatabricksPySparkStepLauncher. So far running jobs with 4vcpu/8GB ECS containers acting as orchestrators for hundreds of parallel ops that farm out to Databricks via the DatabricksPySparkStepLauncher has worked great (I'm guessing we could even trim the ECS resource reqs down farther too). You could probably write a CoiledDask step launcher to do the same if you needed to (although writing step launchers is non-trivial), or even just call out directly to Dask from within an op - this is where my lack of Dask knowledge fails me.
07/22/2022, 7:32 AM
Ohh, so Dagster does not distribute the ops to a cluster, it just spun up machine to run jobs.