Hi All, At AppliedAI, We are building a machine l...
# ask-community
d
Hi All, At AppliedAI, We are building a machine learning framework based on Dagster. We have a question what is the best practice for developing, working with inference pipelines in Dagster? We would like it to have low latency, even for single sample inference. We would like to be able to perform inference in parallel. Ideally we would share the same model across multiple inference workflows. The inference pipeline should re-use solids from a separate ML training pipeline - We have tried developing pipelines which should be triggered based on a sensor which checks for new inputs(currently we tried using a root solid with ‘DynamicOutput’) - Behavior of
map()
on solids with dynamic outputs seems to collect all results first, only then apply any downstream solids (preventing infinite event streams) is this the expected behaviour? We have a few concerns: The time to instantiate a workflow for a single sample infernce could be quite slow. (can we keep workflows alive for multiple runs?) The workflows are blocking so we need multiple workflows, (are celery worker the solution to this?) Do you have any suggestions, ideas how we should be thinking about this?
👍🏻 1
s
Behavior of
map()
on solids with dynamic outputs seems to collect all results first, only then apply any downstream solids (preventing infinite event streams) is this the expected behaviour?
That is the current expected behavior. In the future, we may enable downstream steps to get kicked off before the step with the dynamic output completes, but infinite streaming is not a primary use case we have in mind.
What overall latency are you aiming for?
d
Hi Sandy, Thank you for the quick reply. Map() sounds good, looking forward to it. WRT latency, ideally we want the latency of the pipeline to have as little over head as possible outside of the actual solids being invoked. So the pipeline would consist of data transformation/cleaning, feature transformation and then model serving (which would be done by another system but being called from Dagster) Our concern is that each sensor call would redeploy the pipeline, (when we are looking to reuse the same pipeline. The latency that’s concerning us is that the sensor deploys a new workflow, when we’re really looking to call the workflow with new data from either RPC or message Q. Are we misunderstanding something? Thank you again in advance for your help.
s
The overhead of launching a run depends on which run launcher you're using - https://docs.dagster.io/deployment/run-launcher. If you're using the default run launcher, the overhead is basically writing a couple rows to a database and launching a subprocess, which can be pretty quick. If you're using the K8sRunLauncher, then it'll be more time consuming while you wait for K8s to provision resources and launch a pod.
d
Hi Sandy, that’s what we thought, we’re using k8s and wondered if we could re-use workflow pods, or set up some form of streaming so we don’t have to redeploy.
Thank you again for your response. 🙂
s
Ah, got it. Dagster currently doesn't have a run launcher that supports that.
d
Hi Sandy, Prefect does something like this: https://docs.prefect.io/core/pins/pin-08-listener-flows.html . Is this possible to have something similar in Dagster?
s
I believe that sensors are the most similar concept in Dagster: https://docs.dagster.io/concepts/partitions-schedules-sensors/sensors
d
Hi, yes we’re using them, but the key thing was the reuse of workflows, pods/processes which was interesting. Does Dagster have a plan to do this? We really like Dagster BTW, so if we can help, or you have a timeline that would also be awesome
s
Ah, got it. That's not in our current plan, but we'd be open to it as a longer-term direction. A system that manages a set of long-running processes ends up decently different architecturally than a system that spawns processes with bounded lifespans
d
Hi Sandy, That would be awesome. Is there some way we can support this?
1