what is the best executor for a stream processing ...
# announcements
i
what is the best executor for a stream processing if there is any? So a pipeline workflow would calculate aggregates in a hierachy (small computes, but high concurency)
Here is a pipeline for one aggregate
could I use dagster like this? in a streaming fashon?
a
what is your target infrastructure for deployment?
multiprocess
is the place to start if you are on a single box
i
@alex thanks you read my question. so my target infra would be k8s, we have a cluster, so my plan is to start some dagster worker instance (lets say worker, because i still dont understand the runtime model for dagster) and I also hope, i can execute pipline hiting the dagster API (I did not check it, just saw on youtube and i have seen there is a graphql api) 🙂 so I have put my fate for this....do you think this is possible.....schedule multiple (small compute) pipelines from outside of Dagster over its API, and not waiting for independent python runtime provisioning even we talk about docker
these pipelines are going to be written in python, I come from Go, if u know Cadence Workflow, I would like to use Dagster like that 🙂 thats my plan
I try to evaluate this usecase
a
not waiting for independent python runtime provisioning even we talk about docker
I don’t quite follow this sentence, can you try to clarify?
i
ofc, sorry about my english
So, if Dagster wants to start idepedent docker container for every pipeline, i hope i can avoid this, that would be very slow for me
I would not be able to use it
is this helped?
If I am able to start 4-5 worker containers which are going to run my concurrent pipelines (maybe with semaphores, resource controlls) that would be awesome 😄
a
how tight are your latency requirements? We don’t require launching in new pods, though that is the set up our helm chart is designed for (using k8s Jobs).
i
i would like to use it for stream processing, because there are a lot of independent input of a pipeline, because of the eventually consistency so the pipelines needs to wait for that...even we talk about somwhere between 100-1500ms
a
we do isolate execution in to separate processes - so at minimum you are paying process init overhead
i
that is cool for me, I just dont want to wait for a container spin up
cool....i am still learning the documentation....I just wanted to ask some compass...so thanks...that will work for me (process execution)
thanks @alex
a
from the information i have so far - i think what you are looking at is possible, but would require some fiddling since our close partners so far have preferred isolation over latency
i
I see...if it is possible, then I am going to take my chances 😉 - maybe I need to write some kind of semaphor or something...I did not see any sensors, or semaphores, maybe they are there I just need more time to understand and familiar with the infra, and execution model.....if i have more concrete questions i will ask it...so thanks...I appreciate your help
a
two pluggable parts of the system that are relevant are the run launcher which is what decides how/where to execute a pipeline run, and the executor which decides how to execute the steps in a run https://docs.dagster.io/overview/run-launchers-executors/run-launcher you may end up needing to write a custom run launcher achieve certain end states
i
i am checking it
a
we don’t have any semaphores, but you could write a
resource
to model one
sensors
are coming in
0.10.0
tentatively releasing the beginning of January
i
well I am going to save your last two posts 🙂 I think these will be life saver next week :D
thanks
c
@Istvan Darvas your implementation needs are similar to my own. Were you able to make Dagster comply with what you wanted?