The cloud-native orchestrator for the whole development lifecycle, with integrated lineage and observability.

dagster

Dagster takes a full minute to finalize a run, can that be sped up? I have many small tasks that are short running.

when you say finalizing a run, did you mean it takes 1 min to finish the run after all steps have already completed?

also, do you mind sharing your setup? e.g. how did you deploy, were those runs in multiprocessing, etc?

yes exactly that, it finished all the steps and takes a minute to stop

<https://dagster.slack.com/archives/C01U954MEER/p1662024073286669>

I've tried both the inprocess and multiprocess executor but there was no difference

I'm using the k8s job executor (one k8sjob per job) minio as storage and a postgres db

it s a kubernetes deployment in a non-internet-connected system, so I've cannot use the helm chart (unfortunately) so I build the configmaps etc myself on the basis of the helm charts

I have run monitoring enabled, with start_timeout_seconds 360, and poll_interval_Seconds = 60 (was 120 but this change did not fix my issues)
I use the dagster_k8s.launcher as run_launcher

I use local_artefact_storage (for now) and write the compute logs to minio (s3) with the s3.compute_log_manager

and schedule storage is postgres schedule storage

cc <@U015C9U9RLK> potentially k8s perf related

I wish I could give you more specifics, but all I have is the timings from the logs.

Yeah it’s odd- it certainly has to do a few db writes to finish up but 60s does seem overly long. It has that delay regardless of how many other jobs are running?

yeah it is unrelated to the number of jobs.

I do run on spinning disks, not ssds so there might be some overhead in there

it seems there was a bug in minio that created a delay on empty file writing or something, i have more details tomorrow.

<@U015C9U9RLK> possibly not kubernetes specific, but minio specific bug

I wouldn’t even have considered that since the run worker shouldn’t be writing to minio- that should only be the step workers, and they write to the compute log storage every time they finish

but I guess the delay from the last steps writing is the most noticeable

I guess so, I'll update you when I find out more. Because I want to know what exactly the bug was

<@U015C9U9RLK>, Okay this is what we found: minio has a bug when you write an empty file it rejects the request. After a minute of retries it fails (silently?) Dagster has a setting for `S3: skip_empty_files` this is not default true. When we enable that setting we get faster results. (we logged in almost all the steps so it didn't cause a lot of problems)