Hello dagster community! I am excited to be here! ...
# ask-community
o
Hello dagster community! I am excited to be here! I have a question about the design of my workflow I am designing a workflow that includes some heavy compute tasks (mostly ffmpeg tasks). if I understand correctly, it is a bad practice to run compute heavy tasks in the context of the dagster executors. so I thought about extracting these workloads to an external service and make dagster just send requests to these service. my questions are whether this is a good approach or not, and if it is, who should handle the materialization of the files after processing? my external service can upload the processed files to an object storage and return the path to dagster, or it can return the whole huge byte array over the network and let dagster persists that to an object storage (that sounds bad to me). how would you handle this pipeline? thanks!
r
Is this prescription true? Or only true for
execute_in_process
? A long running deep-learning model would execute in a separate process right? And so would not block anything else...
o
I can use execute_in_process. my question is whether is this is the correct approach when doing cpu intensive tasks. as far as i understand, dagster should function only as an orchestrator and avoid doing any heavy lifting in the executors
z
I think it depends on the context... if you're using serverless, then your compute is limited so you probably don't want to do cpu intensive tasks there. If you're hosting it yourself or using a hybrid deployment you might have dagster deployed on workers that have lots of cpu / disk (I have it deployed on ECS where you can get up to 16cpu / 120GB, not a huge worker but a decent amount of firepower). If you're using a run launcher that spins up new workers for runs then I don't personally see a reason why you shouldn't execute CPU-intensive tasks within the executor. It just kinda depends on where Dagster is deployed and what RunLauncher you're using. This prescription is definitely something you hear strongly from airflow, but it's quite a bit more nuanced in Dagster I think.
r
For execute-in-process you definitely don’t want long running…
z
Well yeah, I guess I kinda assumed people weren't using execute-in-process for production jobs but maybe that's not a good assumption to make
I guess now I'm curious as to why someone would use execute-in-process outside of testing?
o
thanks everybody!