https://dagster.io/ logo
s

Sam Rausser

05/30/2020, 7:05 PM
would i run into any issues if i had a pipeline with one long running solid that did stuff and then spun up
execute_pipeline
in another thread or process and loop? like say a kafka consumer spilling to disk after N bytes and then running
execute_pipeline
on a pipeline that finds the file and processes it etc.
a

alex

06/01/2020, 1:33 PM
I dont think we’ve much experience internally or reported externally about super long running pipelines, so I’m not sure about the pipeline containing the kafka consumer. For picking up the results and running a pipeline - I would recommend using a fast ticking schedule definition with
should_execute
implemented to check for the data to be processed. You can see a conceptually similar approach used for backfilling here https://github.com/dagster-io/dagster/blob/master/examples/dagster_examples/schedules.py
that said
spun up execute_pipeline in another thread or process and loop
the only real issue i expect you to face here is managing the life cycle of the process/thread and making sure it doesn’t negatively impact the outer pipeline execution. Things will change a bit in a few weeks when we release
0.8.0
and there could be another option