Dagster takes a full minute to finalize a run, can that be sped up? I have many small tasks that are short running.
when you say finalizing a run, did you mean it takes 1 min to finish the run after all steps have already completed?
also, do you mind sharing your setup? e.g. how did you deploy, were those runs in multiprocessing, etc?
yes exactly that, it finished all the steps and takes a minute to stop https://dagster.slack.com/archives/C01U954MEER/p1662024073286669
I've tried both the inprocess and multiprocess executor but there was no difference
I'm using the k8s job executor (one k8sjob per job) minio as storage and a postgres db
it s a kubernetes deployment in a non-internet-connected system, so I've cannot use the helm chart (unfortunately) so I build the configmaps etc myself on the basis of the helm charts
I have run monitoring enabled, with start_timeout_seconds 360, and poll_interval_Seconds = 60 (was 120 but this change did not fix my issues) I use the dagster_k8s.launcher as run_launcher
I use local_artefact_storage (for now) and write the compute logs to minio (s3) with the s3.compute_log_manager
and schedule storage is postgres schedule storage
event_log storage also postgres
run_storage also postgres
cc @johann potentially k8s perf related
I wish I could give you more specifics, but all I have is the timings from the logs.
Yeah it’s odd- it certainly has to do a few db writes to finish up but 60s does seem overly long. It has that delay regardless of how many other jobs are running?
yeah it is unrelated to the number of jobs.
I do run on spinning disks, not ssds so there might be some overhead in there
it seems there was a bug in minio that created a delay on empty file writing or something, i have more details tomorrow.
@johann possibly not kubernetes specific, but minio specific bug
wow, that’s unexpected
I wouldn’t even have considered that since the run worker shouldn’t be writing to minio- that should only be the step workers, and they write to the compute log storage every time they finish
but I guess the delay from the last steps writing is the most noticeable
I guess so, I'll update you when I find out more. Because I want to know what exactly the bug was
@johann, Okay this is what we found: minio has a bug when you write an empty file it rejects the request. After a minute of retries it fails (silently?) Dagster has a setting for
S3: skip_empty_files
this is not default true. When we enable that setting we get faster results. (we logged in almost all the steps so it didn't cause a lot of problems)