Help: I'm trying to decide how to structure my dag...
# ask-community
p
Help: I'm trying to decide how to structure my dagster project. The flow is as follows: • run_id is specified ideally via launchpad (this changes regularly) • The run id passes to a function producing a config dictionary: ◦ reference date ◦ storage folder ◦ links to parameter tables • The config is then available to several processes, all producing pandas dataframes ◦ Storage of the dataframes is dependent on the config Typical workflow is to run and then debug, only re-running downstream streps. --- Would Ops & Jobs or Assets fit this best? How would a config generator be shared? Would this be an asset returning a dict, that is a required resource of all downstrean steps?
c
cc @sandy, but it sounds to me like ops/jobs might be a better fit for this use case. Generally speaking, we expect assets to represent a single entity, and it sounds like each run of this will be producing different entities, each entity tied to the parameters passed to this specific run.
p
Ok this is good perspective. Yes the output of each step would be specific to the set of inputs. They would be the same shape -> was thinking about the custom dataframe types for validation, love this feature of dagster -> will help our processes a lot
And, every time you do a run with a different ID, a different workspace/folder structure is created
c
given that there is no set number of IDs to be run with, I think that ops/jobs is probably the best fit for this arch yea
n
I have been also thinking about doing something similar, would dynamic partitions make sense?
p
I still don't quite know how to pass a config generated at runtime to subsequent assets/jobs.
I was thinking about having a job that fills a config similar to that and yields a
runRequest
for a separate job (that has the partitioning maybe?)