Jean-Pierre M

06/04/2021, 3:13 PM
I have been using the K8sRunLauncher but I'm now looking into switching to CeleryK8sRunLauncher so that I can use Celery queues to manage connections to a fragile external resource. From what I can see and understand, the CeleryK8sRunLauncher creates a K8s job/pod for each solid in the DAG. Is there a way to minimize that? My DAG currently contains a bunch a small solids that are grouped into composite solids and the overhead of K8s launching a job/pod for each solid is very costly. Thanks!


06/04/2021, 6:27 PM
We do not yet offer this level of granular control during execution though it is something we have our eyes towards.
a bunch a small solids that are grouped into composite solids
One question is what value do you get out of these being separate solids composed in to a composite solid? One way to achieve what you are looking for currently is to change it to regular python functions that are composed in to a regular solid. This does have trade-offs so its really a question of efficiency in computation versus event log information / retry granularity.

Jean-Pierre M

06/04/2021, 8:44 PM
Nice to hear that that kind of control might be coming in the future. As for my existing pipeline, there really isn't any reason why we have multiple small solids composed together as opposed to fewer larger solids. I think it was just a design decision we took early on. I'll look into refactoring our pipeline and then the K8s pod spin up won't be such a burden. Thanks again!