https://dagster.io/ logo
Title
j

Jorge Sánchez (Jorjatorz)

12/21/2021, 4:42 PM
Dynamic config i.e. modify config from ops. - Discussion/question Hi everyone, I am new to this Slack but i’ve been working with Dagster for several months already. Here is my first question/discussion to know your opinion. Before migrating to the new framework (jobs, ops, graphs) I managed inter-pipeline communication through AssetMaterializations and sensors. Some pipelines created some data (lets say,
model_id
UUID) and added this information to the AssetMaterialization metadata. Then the asset_sensor read this metadata and yield a run for the target pipeline with a configuration filled with this metadata. In other words, there was a “root” pipeline who received the initial configuration and this config was cascaded down to other pipelines through assets. If a pipeline had to add new config data, it added to the AssetMat. metadata and the next pipeline would make use of it. Usually the configs data configured a
ResourceDefinition.string_resource()
so you can expect the the
model_id
I commented before would become a String resource called
model_id
, so it is easier to get it from all solids that required it. I am currently migrating to the new framework, and these interconnected pipelines has now become
graphs
The job provides the general configuration to the main resources but there are some resources (like the
model_id
) that are created in a subgraph and subsequent graphs/solids will make use of it. The approach I am currently taking is that each graph will return a
config_dict
and the next graph will get this
config_dict
as input. So the graph that creates a model will insert the
model_id
into it and subsequent graphs will have access to it, BUT this time they would get it through this
config_dict
instead of a resource. I would like to know your thoughts about this approach. Ideally I would like to be able to insert resources (for example the
model_id
resource) or be able to modify the
ResourceDefinition.string_resource()
from a solid, so the
model_id
is empty and then populated with the id. Thank you 😄
o

owen

12/21/2021, 5:51 PM
Hi @Jorge Sánchez (Jorjatorz)! This is an interesting use case. I think your approach makes a ton of sense. There's no way to make a "global variable" at runtime (or to modify a resource), so having a dictionary containing all of the necessary information and wiring that up seems more convenient than splitting that out into separate inputs/outputs. For the record, your original implementation also seemed quite reasonable, but this single-job approach is probably simpler to maintain and understand.
j

Jorge Sánchez (Jorjatorz)

12/21/2021, 5:55 PM
Thank you for the answer @owen and happy to see that this makes sense to you. Indeed, a single-job approach is better and thats the main reason why I decided to migrate to the most recent version. The original implementation started to become cumbersome when more complex inter-pipeline dependencies were required (i.e. conditionals). Now we just create specific jobs.