Help I m trying to decide how to structure my dagster projec dagster #ask-community

Help: I'm trying to decide how to structure my dag...

Peter Davidson

12/01/2022, 11:06 AM

Help: I'm trying to decide how to structure my dagster project. The flow is as follows: • run_id is specified ideally via launchpad (this changes regularly) • The run id passes to a function producing a config dictionary: ◦ reference date ◦ storage folder ◦ links to parameter tables • The config is then available to several processes, all producing pandas dataframes ◦ Storage of the dataframes is dependent on the config Typical workflow is to run and then debug, only re-running downstream streps. --- Would Ops & Jobs or Assets fit this best? How would a config generator be shared? Would this be an asset returning a dict, that is a required resource of all downstrean steps?

chris

12/01/2022, 2:55 PM

cc @sandy, but it sounds to me like ops/jobs might be a better fit for this use case. Generally speaking, we expect assets to represent a single entity, and it sounds like each run of this will be producing different entities, each entity tied to the parameters passed to this specific run.

Peter Davidson

12/01/2022, 2:57 PM

Ok this is good perspective. Yes the output of each step would be specific to the set of inputs. They would be the same shape -> was thinking about the custom dataframe types for validation, love this feature of dagster -> will help our processes a lot

Peter Davidson

12/01/2022, 2:57 PM

And, every time you do a run with a different ID, a different workspace/folder structure is created

chris

12/01/2022, 4:11 PM

given that there is no set number of IDs to be run with, I think that ops/jobs is probably the best fit for this arch yea

nickvazz

12/01/2022, 5:55 PM

I have been also thinking about doing something similar, would dynamic partitions make sense?

Peter Davidson

12/01/2022, 6:02 PM

I still don't quite know how to pass a config generated at runtime to subsequent assets/jobs.

nickvazz

12/01/2022, 6:03 PM

https://docs.dagster.io/concepts/configuration/config-schema#specifying-runtime-configuration

nickvazz

12/01/2022, 6:05 PM

I was thinking about having a job that fills a config similar to that and yields a

runRequest

for a separate job (that has the partitioning maybe?)

2 Views

Open in Slack

Previous Next