Hey, I have a dagster deployment which is running ...
# deployment-kubernetes
d
Hey, I have a dagster deployment which is running on version 0.14.14 and we are trying to migrate it to the newer version. As the codebase is still hooked to pipelines and solids we have re-written much of it and used backfills wherever necessary to provide support for graphs and jobs. The newer Runs that we do on jobs instead of pipelines are working well, But the problems we are facing is • Almost every job run fails at some solid[yup using solids with jobs] with
DagsterExecutionInterruptedError
error. This frequency has increased many folds the moment we switched to Jobs. • The RunRetry exception when thrown after this happens [have written a wrapper function which throws this exception], Is not honoured and the run fails completely. Is there any different logic for as to how retries are handled in job-solids vs pipeline-solids
p
Hi Divyansh…. Are you executing both the old and new runs on
0.14.14
? Or are you testing a newer version of dagster? Do you have a run launcher configured, or are you using the default? Are the newer runs being executed with the multiprocess executor? I believe that with jobs, the default executor switched to the multi-process executor, rather than the in-process executor (the default with pipelines). When using this with the default run launcher, you might be aggressively spawning new processes, causing some sort of termination event to be sent. To test if this is the issue, you can override your executor in your job, to use the in_process_executor.
Copy code
@job(executor_def=in_process_executor)
def my_job():
    ...
To maintain the process isolation (and increase op parallelization), you keep using the multiprocess exeutor, but configure your jobs to limit the number of concurrent ops executing.