Hi all I am very new to Dagster and exploring it for our use dagster #announcements

Hi all, I am very new to Dagster and exploring it ...

Arun Kumar

03/28/2021, 2:21 AM

Hi all, I am very new to Dagster and exploring it for our use-case. Currently going through the docs and got confused between composite solids vs pipelines. We have different sets of computations, each independent, doing the same steps but on different assets (tables) and have different upstream dependencies. I was tempted to create configurable/reusable composite solid, programmatically create new invocations of the composite solid for each set of computations with different config params. But in this case, we cannot execute each set independently as solids are not executable and we might need this capability as each have different upstream dependencies. On the other hand, if I model each set as an independent pipelines, I am not sure how to create them programmatically as pipelines does not support configs. Could someone help me design our use-case and correct me if I am missing something? I also see there is an ongoing effort to merge pipelines and composite solid into graphs and make solids executable. https://github.com/dagster-io/dagster/discussions/2902

Noah K

03/28/2021, 3:26 AM

That sounds like you would want one solid with some solid_config parameters to set which models it's working on?

Noah K

03/28/2021, 3:26 AM

Is it just one processing step or many?

Arun Kumar

03/28/2021, 9:03 AM

Multiple steps.

sandy

03/29/2021, 3:14 PM

Hi @Arun Kumar - how would you like to execute your pipelines? I.e. do you want each one to be on a schedule? Do you want to execute them manually from the UI?

Arun Kumar

03/29/2021, 8:25 PM

Hi @sandy, each pipeline can have some upstream dependencies. Was planning to use sensors to trigger them. But I would also like to have the capability to run it manually from the UI incase if something goes wrong. Looking deeper into the docs, it looks like I can programmatically create pipelines using PipelineDefinition. I think this might solve my problem.

sandy

03/29/2021, 10:18 PM

Got it - that makes sense. And yes, I was going to suggest something similar. You might also consider something like:

Copy code

def make_pipeline(name, ...):
    @pipeline(name=name, ...)
    def _pipeline():
        ...

    return _pipeline

Arun Kumar

03/29/2021, 11:47 PM

Thanks @sandy . Another Question: In our case, the upstream dependencies of the pipelines are actually maintained by our central data team which uses airflow. Have you come across any similar usecases before? I am currently thinking of defining each dependency as a sensor and within the sensor constantly poll the airflow rest APIs to check if the upstream dependencies have finished

sandy

03/30/2021, 12:48 AM

That sounds like the right solution to me

Arun Kumar

03/30/2021, 1:07 AM

Thanks a lot 🙂

2 Views

Open in Slack

Previous Next