Dagster takes a full minute to finalize a run, can...
# ask-community
r
Dagster takes a full minute to finalize a run, can that be sped up? I have many small tasks that are short running.
y
when you say finalizing a run, did you mean it takes 1 min to finish the run after all steps have already completed?
also, do you mind sharing your setup? e.g. how did you deploy, were those runs in multiprocessing, etc?
r
yes exactly that, it finished all the steps and takes a minute to stop https://dagster.slack.com/archives/C01U954MEER/p1662024073286669
I've tried both the inprocess and multiprocess executor but there was no difference
I'm using the k8s job executor (one k8sjob per job) minio as storage and a postgres db
it s a kubernetes deployment in a non-internet-connected system, so I've cannot use the helm chart (unfortunately) so I build the configmaps etc myself on the basis of the helm charts
I have run monitoring enabled, with start_timeout_seconds 360, and poll_interval_Seconds = 60 (was 120 but this change did not fix my issues) I use the dagster_k8s.launcher as run_launcher
I use local_artefact_storage (for now) and write the compute logs to minio (s3) with the s3.compute_log_manager
and schedule storage is postgres schedule storage
event_log storage also postgres
run_storage also postgres
y
cc @johann potentially k8s perf related
r
I wish I could give you more specifics, but all I have is the timings from the logs.
j
Yeah it’s odd- it certainly has to do a few db writes to finish up but 60s does seem overly long. It has that delay regardless of how many other jobs are running?
r
yeah it is unrelated to the number of jobs.
I do run on spinning disks, not ssds so there might be some overhead in there
it seems there was a bug in minio that created a delay on empty file writing or something, i have more details tomorrow.
@johann possibly not kubernetes specific, but minio specific bug
j
wow, that’s unexpected
I wouldn’t even have considered that since the run worker shouldn’t be writing to minio- that should only be the step workers, and they write to the compute log storage every time they finish
but I guess the delay from the last steps writing is the most noticeable
r
I guess so, I'll update you when I find out more. Because I want to know what exactly the bug was
@johann, Okay this is what we found: minio has a bug when you write an empty file it rejects the request. After a minute of retries it fails (silently?) Dagster has a setting for
S3: skip_empty_files
this is not default true. When we enable that setting we get faster results. (we logged in almost all the steps so it didn't cause a lot of problems)