hi guys - new to Dagster and just running the tuto...
# announcements
hi guys - new to Dagster and just running the tutorials right now. I'm still evaluating whether Dagster is the right fit for a project i'm working on. My main question is the following: why does it take roughly 10 seconds between launching execution of a given pipeline and the actual start of pipeline running. AKA there are about 10 seconds where the state of the pipeline is "NOT STARTED"
What launcher and executor plugins are you using?
i launched execution via the Dagit UI. i haven't set up anything for executor
is there an easy way for me to see what executor i was using? is the default just a local executor?
You would have set it in the playground input settings if needed
I think the default is in-process though?
Try the multi-process one
looks like even with the multi-process one, there is a 10 second latency in starting the pipeline
in fact, with the multi-process executor (using filesystem for storage), the whole thing took much longer... strange. Here is what i'm talking about tho
the latency highlighted by this oval is what i'm curious about. What is happening in those 10 seconds?
we execute pipelines in a fresh process for isolation - so this time is the new process spin up, init, and loading the pipeline and its deps
There are alternate launchers but starting a Kubernetes pod is usually going to take more then 10 seconds 🙂
My weirdo setup can get it down to a few milliseconds but probably not worth optimizing for most folks
we haven’t spent time optimizing this recently so it could be something silly we are doing, but in past instances this has come up it has been dominated by python module loading time. Some libraries can be very expensive on import.
Yeah, you could probably cut it down by forking rather than launching completely from scratch
That would have most of dagster itself already in memory at least
But 10 seconds on a 3 hour job is not much of a worry 🙂
🙂 1
Got it. that makes sense. our jobs aren't likely to take 3 hours but more likely 5-10 minutes to get through a full pipeline.
I have one additional question for you guys (@Noah K @alex): is it inappropriate to have solids in my pipeline which may require a user to provide some input? Do you have any examples where this happens?
Not really sure how that would work.
Like you would have to make a solid that busy-waits for a web service somewhere else to confirm the input?
And it would be taking up an execution slot the whole time
yeah thats effectively what i'm talking about. I know that seems really weird but that is my use-case.
fairly certain that we do not have any examples of that type of pattern, but theres no reason I can think of why it would not work, given some external thing like a web service as Noah described thats capturing the input
@alex My worry about be overhead. Until async pipelines+solids happen, you would have an executor process sitting there waiting for input. I guess it depends on scale and scope 🙂
unlikely that i'd ever have more than 1 maybe a few pipelines running on one instance. I probably should have led with that.
@alex @Noah K for context, does Dagster have something similar to this: https://medium.com/the-prefect-blog/needs-approval-184f2512a3cf
or even better, can i provide "approval" to continue a task from outside of the dagster UI?
Nothing built in, you would have to build your own solids for it
Dagster execution is not (yet) async or resumable.
So you would make a solid that is more or less
while not is_approved(): time.sleep(10)
or something
probably makes an HTTP request to a little web service you write
thank you!
Hi @aakash indurkhya, here’s a blog post from goodeggs about how they use dagster for human in the loop: https://dagster.io/blog/good-eggs-1
🙏 1
@Noah K’s suggestion for adding a time.sleep(…) within the solid and we ae currently working on async support. Another option is to have the solid fail and have the “approve” button trigger re-execution from failure
Hi @Cat Wu! I was wondering if there have been any new developments in this regard? I am also evaluating prefect and this would be something quite neat to have! 🙂
cc @alex?
👋 1
no new developments in this area to report