Help: I'm trying to decide how to structure my dag...
# ask-community
Help: I'm trying to decide how to structure my dagster project. The flow is as follows: • run_id is specified ideally via launchpad (this changes regularly) • The run id passes to a function producing a config dictionary: ◦ reference date ◦ storage folder ◦ links to parameter tables • The config is then available to several processes, all producing pandas dataframes ◦ Storage of the dataframes is dependent on the config Typical workflow is to run and then debug, only re-running downstream streps. --- Would Ops & Jobs or Assets fit this best? How would a config generator be shared? Would this be an asset returning a dict, that is a required resource of all downstrean steps?
cc @sandy, but it sounds to me like ops/jobs might be a better fit for this use case. Generally speaking, we expect assets to represent a single entity, and it sounds like each run of this will be producing different entities, each entity tied to the parameters passed to this specific run.
Ok this is good perspective. Yes the output of each step would be specific to the set of inputs. They would be the same shape -> was thinking about the custom dataframe types for validation, love this feature of dagster -> will help our processes a lot
And, every time you do a run with a different ID, a different workspace/folder structure is created
given that there is no set number of IDs to be run with, I think that ops/jobs is probably the best fit for this arch yea
I have been also thinking about doing something similar, would dynamic partitions make sense?
I still don't quite know how to pass a config generated at runtime to subsequent assets/jobs.
I was thinking about having a job that fills a config similar to that and yields a
for a separate job (that has the partitioning maybe?)