Hi all, I am very new to Dagster and exploring it ...
# announcements
a
Hi all, I am very new to Dagster and exploring it for our use-case. Currently going through the docs and got confused between composite solids vs pipelines. We have different sets of computations, each independent, doing the same steps but on different assets (tables) and have different upstream dependencies. I was tempted to create configurable/reusable composite solid, programmatically create new invocations of the composite solid for each set of computations with different config params. But in this case, we cannot execute each set independently as solids are not executable and we might need this capability as each have different upstream dependencies. On the other hand, if I model each set as an independent pipelines, I am not sure how to create them programmatically as pipelines does not support configs. Could someone help me design our use-case and correct me if I am missing something? I also see there is an ongoing effort to merge pipelines and composite solid into graphs and make solids executable. https://github.com/dagster-io/dagster/discussions/2902
n
That sounds like you would want one solid with some solid_config parameters to set which models it's working on?
Is it just one processing step or many?
a
Multiple steps.
s
Hi @Arun Kumar - how would you like to execute your pipelines? I.e. do you want each one to be on a schedule? Do you want to execute them manually from the UI?
a
Hi @sandy, each pipeline can have some upstream dependencies. Was planning to use sensors to trigger them. But I would also like to have the capability to run it manually from the UI incase if something goes wrong. Looking deeper into the docs, it looks like I can programmatically create pipelines using PipelineDefinition. I think this might solve my problem.
s
Got it - that makes sense. And yes, I was going to suggest something similar. You might also consider something like:
Copy code
def make_pipeline(name, ...):
    @pipeline(name=name, ...)
    def _pipeline():
        ...

    return _pipeline
a
Thanks @sandy . Another Question: In our case, the upstream dependencies of the pipelines are actually maintained by our central data team which uses airflow. Have you come across any similar usecases before? I am currently thinking of defining each dependency as a sensor and within the sensor constantly poll the airflow rest APIs to check if the upstream dependencies have finished
s
That sounds like the right solution to me
a
Thanks a lot 🙂